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^S| ' Abstract 

O ■■ Let (Si.i,S2,i) ~ i.i.d p(si,S2), i = 1,2, ... be a memoryless, correlated partial side information sequence. 

■^^ , In this work we study channel coding and source coding problems where the partial side information {Si,S2) is 

available at the encoder and the decoder, respectively, and, additionally, either the encoder's or the decoder's side 
information is increased by a limited-rate description of the other's partial side information. We derive six special 
cases of channel coding and source coding problems and we characterize the capacity and the rate-distortion functions 
for the different cases. We present a duality between the channel capacity and the rate-distortion cases we study. In 

t/3 ' order to find numerical solutions for our channel capacity and rate-distortion problems, we use the Blahut-Arimoto 

algorithm and convex optimization tools. As a byproduct of our work, we found a tight lower bound on the Wyner- 
Ziv solution by formulating its Lagrange dual as a geometric program. Previous results in the literature provide a 

^ , geometric programming formulation that is only a lower bound, but not necessarily tight. Finally, we provide several 
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examples corresponding to the channel capacity and the rate-distortion cases we presented. 
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Blahut-Arimoto algorithm, channel capacity, channel coding, convex optimization, duality, Gelfand-Pinsker channel coding. 



^^ I geometric programming, partial side information, rate-distortion, source coding, Wyner-Ziv source coding. 



I. Introduction 

In this paper we investigate point-to-point channel models and rate-distortion problem models where both users 
have different and correlated partial side information and where, in addition, a rate-limited description of one of 
the user's side information is delivered to the other user We then show the duality between the channel models and 
the rate-distortion models we investigate. In the process of investigating the rate-distoition problems, we found a 
tight lower bound on the rate-distortion of the Wyner-Ziv ||T1 problem. We show here that it is possible to wiite the 
Lagrange dual of the Wyner-Ziv rate-distortion function as a geometric program. Then, we show that the optimal 
solution of this geometric program is the correct solution of the Wyner-Ziv problem. 

For the convenience of the reader, we refer to the state information as the side information, to the partial side 
information that is available to the encoder as the encoder's side information (ESI) and to the partial side information 
that is available to the decoder as the decoder's side information (DSI). To the rate-limited description of the other 
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Fig. 1: Increased partial side information example. The encoder wants to send a message to the decoder over an interrupted 
channel in the presence of side information. The encoder is provided with the ESI and the decoder is provided with increased 
DSI. i.e., the decoder is informed with a rate-limited description of the ESI in addition to the DSI. 



user's side information we refer as the increase in the side information. For example, if the decoder is informed 
with its DSI and, in addition, with a rate-limited description of the ESI, then we would say that the decoder is 
informed with increased DSI. 

To make the motivation for this paper clear, let us look at a simple example, as depicted in Figure [T] Two 
remote users. User 1 - the encoder and User 2 - the decoder, want to communicate between them over a channel 
that is being interrupted by two interrupters. Interrupter 1 and Interrupter 2. We allow the interruptions 5*1 and 52 
generated by the interrupters to be correlated, i.e., (5i, 5*2) ^ p(si, S2). Assume that Interrupter 1 is located in close 
proximity to User 1 and can fully describe its future inteiTuption, Si, to User 1 and that Interrupter 2 is located 
in close proximity to User 2 and can also fully describe its future interruption, 6*2, to user 2. In addition, assume 
that Interrupter 1 can increase the side information of User 2 with rate-limited information about its interruption. 
In these circumstances, we pose the question; what is the capacity of the channel between User 1 and User 2? We 
extensively discuss the answer to this question in the foithcoming sections. 



A. Channel capacity in the presence of state information 

The three problems of channel capacity in the presence of state information that we adress in this paper are 
presented in Figure |2] We make the assumption that the encoder is informed with partial state information, the ESI 
(^i), and the decoder is informed with different, but correlated, partial state information, which is the DSI (S2)- 
The channel capacity problem cases are: 

> Case 1: The decoder is provided with increased DSI; i.e., in addition to the DSI, the decoder is also informed 
with a rate-limited description of the ESI. 

> Case 2: The encoder is informed with increased ESI. 

• Case 2c'- Similar to Case 2, with the exception that the ESI is known to the encoder in a causal manner. 
Notice that the rate-limited description of the DSI is still known to the encoder noncausally. 



We will subsequently provide the capacity of Case 1 and Case 2c and caracterize the lower and the upper bounds 
on Case 2, which differ only by a Markon relation. The results for the first case under discussion. Case 1, can be 
concluded from Steinberg's problem [|2]. In ||2], Steinberg introduced and solved the case in which the encoder is 
fully informed with the ESI and the decoder is informed with a rate-limited description of the ESI. Therefore, the 
innovation in Case 1 is that the decoder is also informed with the DSL The solution for this problem can be derived 
by considering the DSI to be a part of the channel's output in Steinberg's solution. In the proof of the converse in 
his paper, Steinberg uses a new technique that involves using the Csiszar sum twice in order to get to a single-letter 
bound on the rate. We shall use this technique to present a duality in the converse of the Gelfand-Pinsker ||3] 
and the Wyner-Ziv [l] problems, which, by themselves, constitute the basis for most of the results in this paper 
In im, Wyner and Ziv present the rate-distortion function for data compression problems with side information 
at the decoder. We make use of their coding scheme in the achievability proof of the lower bound of Case 2 for 
describing the ESI with a limited rate at the decoder In (J3], Gelfand and Pinsker present the capacity for a channel 
with noncausal CSI at the encoder We use their coding scheme in the achievabiUty proof of Casel and the lower 
bound of Case 2 for transmitting information over a channel where the ESI is the state information at the encoder. 
Therefore, we combine in our problems the Gelfand-Pinsker and the Wyner-Ziv problems. Another related paper 
is (lU, in which Shannon presented the capacity of a channel with causal CSI at the transmitter We make use of 
Shannon's result in the achievability proof of Case 2c for communicating over a channel with causal ESI at the 
encoder We also use Shannon's strategies (J4], for developing an iterative algorithm to calculate the capacity of the 
cases we present in this paper 

Some related papers that can be found in the literature are mentioned herein. Heegard and El Gamal |I5] presented 
a model of a state-dependent channel, where the transmitter is informed with the CSI at a rate limited to i?e and 
the receiver is informed with the CSI at a rate limited to Rd- This result relates to Case 1, Case 2 and Case 2c 
since we consider the rate-limited description of the ESI or the DSI as side information known at both the encoder 
and the decoder Cover and Chiang ||6] extended the Gelfand-Pinsker problem and the Wyner-Ziv problem to the 
case where both the encoder and the decoder are provided with different, but correlated, partial side information. 
They also showed a duality between the two cases, which is a topic that will be discussed later in this paper. 
Rozenzweig, Steinberg and Shamai Q and Cemal and Steinberg (S) studied channels with partial state information 
at the transmitter. A detailed subject review on channel coding with state information was given by Keshet, Steinberg 
and Merhav in Q. 

In addition to these three cases, we also present a more general case, where the encoder is informed with increased 
ESI and the decoder is informed with increased DSI; i.e., there is a rate-limited description of the ESI at the decoder 
and there is a rate-limited description of the DSI at the encoder We provide an achievability scheme that bounds 
the capacity for this case from below, however, this bound does not coincide with the capacity and, therefore, this 
problem remains open. 



B. Rate-distortion with side information 

In this paper we adress three problems of rate-distortion with side information, as presented in Figure |3] In 
common with the channel capacity problems, we assume that the encoder is informed with the ESI {Si) and the 
decoder is informed with the DSI (^2), where the source, X, the ESI and the DSI are correlated. The rate-distortion 
problem cases we investigate in this paper are: 

• Case 1 : The decoder is provided with increased DSI. 

« Case \c'- Similar to Case 1, with the exception that the ESI is known to the encoder in a causal manner The 
rate-limited description of the ESI is still known to the decoder noncausally. 

• Case 2: The encoder is informed with increased ESI. 

Case 2 is a special case of Kaspi's ifTOl two-way source coding for K — \.lr\ ifTOl . Kaspi introduced a model of 
multistage communication between two users, where each user may transmit up to K messages to the other user, 
dependent on the source and the previous received messages. For Case 2, we can consider sending the rate-limited 
description of the DSI as the first transmission and then, sending a function of the source, the ESI and the rate- 
limited description of the DSI as the second transmission. This fits into Kaspi's problem for K = \ and thus Kaspi's 
theorem also applies to Case 2. Kaspi's problem was later extended by Permuter, Steinberg and Weissman lITTI to 
the case where a common rate-limited side information message is being conveyed to both users. Another strongly 
related paper is Wyner and Ziv's paper [1]. In the achievability of Case 1 we use the Wyner-Ziv coding scheme 
twice; once for describing the ESI at the decoder where the DSI is the side information and once for the main source 
and the ESI where the DSI is the side information. The rate-limited description of the ESI is the side information 
provided to both the encoder and the decoder. In ^\ there is an extension to the Wyner-Ziv problem to the case 
where both the encoder and the decoder are provided with correlated partial side information. Weissman and El 
Gamal Iil2l Section 2] and Weissman and Merhav |T3l presented source coding with causal side information at the 
decoder, which relates to Case \c- 

As with the channel capacity, we present a bound on the general case of rate-distortion with two-sided increased 
partial side information. In this problem setup the encoder is informed with a rate-limited description of the DSI 
in addition to the ESI and the decoder is informed with a rate-limited description of the ESI in addition to the 
DSI. We present an achievability scheme that bounds the optimal rate from above, however, this bound does not 
coincide with the optimal rate and, therefore, this problem remains open. 

C. Duality 

Within the scope of this work we point out a duality relation between the channel capacity and the rate-distortion 
cases we discuss. The operational duality between channel coding and source coding was first mentioned by Shannon 
lfT4l . In ifTsl . Pradhan, Chou and Ramchandran studied the functional duality between some cases of channel coding 
and source coding, including the duality between the Gelfand-Pinsker problem and the Wyner-Ziv problem. This 
duality was also described by Cover and Chiang in |l6l, where they provided a transformation that makes duality 
between channel coding and source coding with two-sided state information apparent. Zamir, Shamai and Erez L16i 



and Su, Eggers and Girod ifTTI utilized the duality between channel coding and source coding with side information 
to develop coding schemes for the dual problems. 

In our paper we show that the channel capacity cases and the rate-distortion cases we discuss are operational 
duals in a way that strongly relates to the Wyner-Ziv and Gelfand-Pinsker duality. We also provide a transformation 
scheme that shows this duality in a clear way. Moreover, we show a duaUty relation between Kaspi's problem and 
Steinberg's ^ problem by showing a duality relation between Case 2 source coding and Case 1 channel coding. 
Also, we show duality in the converse parts of the Gelfand-Pinsker and the Wyner-Ziv problems. We show that 
both converse parts can be proven in a perfectly dual way by using the Csiszar sum twice. 

D. Computational algorithms 

Calculating channel capacity and rate-distortion problems, in general, and the Gelfand-Pinsker and the Wyner- 
Ziv problems, in particular, is not straightforward. Blahut ifTsl and Arimoto lfT9l suggested an iterative algorithm 
(to be referred to as the B-A algorithm) for numerically computing the channel capacity and the rate-distortion 
problems. Willems II20I and Dupuis, Yu and Willems |f2T) presented iterative algorithms based on the B-A algorithm 
for computing the Gelfand-Pinsker and the Wyner-Ziv functions. We use principles from Willems' algorithms to 
develop an algorithm to numerically calculate the capacity for the cases we presented. More B-A based iterative 
algorithms for computing channel capacity and rate-distortion with side information can be found in II22I and in II23I . 
A B-A based algorithm for maximizing the directed-information can be found in 1*24]. 

Another approach for solving the Wyner-Ziv rate-distortion problem is the geometric programming approach. 
This approach was presented by Chiang and Boyd in their paper ||251 . in which they described methods, based on 
convex optimization and geometric programming, to calculate the channel capacity of the Gelfand-Pinsker channel 
and to calculate a lower bound on the rate-distortion of the Wyner-Ziv problem. Chiang and Boyd considered the 
Lagrange-dual of the Wyner-Ziv problem and they formulated a geometric program that constitutes a lower bound 
on the rate-distortion. However, their lower bound is not tight because they implicitly used the assumption that the 
derivative of the Lagrangian is zero for each value of the side information individually, while the original expression 
is only restricted to zero when averaging over the side information. During our present work, we found a tight 
lower bound on the rate-distortion of the Wyner-Ziv problem. The tight bound is obtained by considering a primal 
variable in the dual problem. A similar trick has been used recently by Naiss and Permuter ll26l for transforming 
the rate-distortion with feed-forward problem into a geometric program. 

E. Organization of the paper and main contributions 

To summarize, the main contributions of this paper are 1) we give single-letter characterizations of the capacity 
and the rate-distortion functions of new channel and source coding problems with increased partial side information, 
2) we show a duality relationship between the channel capacity cases and the rate-distortion cases that we discuss, 3) 
we provide a tight lower bound on the Wyner-Ziv solution using convex optimization and geometric programming 
tools, 4) we provide a B-A based algorithm to solve the channel capacity problems we describe, 5) we show a 
duahty between the Gelfand-Pinsker capacity converse and the Wyner-Ziv rate-distortion converse. 
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Fig. 2: Channel coding witii state information. Case 1: Rate- 
limited ESI at the decoder. Case 2: Rate-limited DSI at the 
encoder. Case 2c'- Causal ESI and rate-limited DSI at the 
encoder. 
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Fig. 3: Source coding with side information. Case 2: Rate- 
limited DSI at the encoder. Case 1: Rate-limited ESI at the 
decoder. Case Ic: Causal DSI and rate-limited ESI at the 
decoder. The cases are presented in this order to allow each 
source coding case to be paralel to the dual channel coding 
case. 



The reminder of this paper is organized as follows. In Section |II] we introduce some notations for this paper and 
provide the settings of three channel coding and three source coding cases with increased partial side information. 
In Section|III]we present the main results for coding with increased partial side information; we provide the capacity 
and the rate-distortion for the cases we introduced in Section |II] and we point out the duality between the cases 
we examined. Section |IV] contains the main results for the geometric programming; we formulate a geometric 
program that is a tight lower bound on the Wyner-Ziv solution. Section |V] contains illuminating examples for the 
cases discussed in the paper. In Section |Vl] we describe the B-A based algorithm we used in order to solve the 
capacity examples. We conclude the paper in Section IVIII and we highlight two open problems; channel capacity 
and rate-distortion with two-sided rate-limited partial side information. Appendix lAl contains the duality derivation 
for the converse proofs of the Gelfand-Pinsker and the Wyner-Ziv problems and Appendices |B] through |F] contain 
the proofs for our theorems and lemmas. 



II. Problem Setting and Definitions 



In this section we describe and formally define three cases of channel coding problems and three cases of source 
coding problems. All six cases are presented in Figures |2] and [3] 

Notations. We use subscripts and superscripts to denote vectors in the following ways: x^ — {xi, . . . ,Xj) and 



r.3 — 



{xi, . . . ,Xj) for i < j. Moreover, we use the lower case x to denote sample value, the upper case X to 
denote a random variable, the calligraphic letter X to denote the alphabet of X, \X\ to denote the cardinality of 
the alphabet of X and p{x) to denote the probability Pt{X = x}. We use the notation Te {X) to denote the 
strongly typical set of the random variable X, as defined in ||27l Chapter 11]. 



A. Definitions and problem formulation - channel coding with state information 

Definition 1. A discrete channel is defined by the set {A", 5i, 52,p(si, S2),p(y|a;, si, S2), 3^}. The channel's input 
sequence, {Xi G A", i == 1, 2, . . . }, the ESI sequence, {Sii G 5i, i = 1, 2, . . . }, the DSI sequence, {5*2, i G 1S2, i = 
1,2,...}, and the channel's output sequence, {Yi G y,i = 1,2,...}, are discrete random variables drawn from 
the finite alphabets X, 5i, ^2, y, respectively. Denote the message and the message space as VF G {1,2,..., 2"^} 
and let W be the reconstruction of the message W . The random variables {Si^i, 6*2,1) are i.i.d. ~ p(si, S2) and the 
channel is memoryless, i.e., at time i, the output, Yi, has a conditional distribution of 

p(yi|a;*,4,S2,y'~^) ^ p{yi\xi,si^i,S2,i). (1) 

In the remainder of the paper, unless specifically mentioned otherwise, we refer to the ESI and the DSI as if they 
are known to the encoder and the decoder, respectively, in a noncausal manner Also, as noted before, we use the 
term increased side information to indicate that the user's side information also includes a rate-limited description 
of the other user's partial side information. For example, when the decoder is informed with the DSI and with a 
rate-limited description of the ESI we would say that the decoder is informed with increased DSI. 

Problem Formulation. For the channel p{y\x^ si, S2), consider the following channel coding problem cases: 

• Case 1: The encoder is informed with ESI and the decoder is informed with increased DSI. 

• Case 2: The encoder is informed with increased ESI and the decoder is informed with DSI. 

• Case 2c'- The encoder is informed with increased causal ESI {S\ at time i) and the decoder is informed with 
DSI. This case is the same as Case 2, except for the causal ESI. 

All cases are presented in Figure |2] 

Definition 2. A [n, 2"^, 2"^^) code, {j G 1, 2}, for a channel with increased partial side information, as illustrated 
in Figure m consists of two encoders and one decoder The encoders are / and /„, where / is the encoder for the 
channel's input and /„ is the encoder for the side information, and the decoder is g, as described for each case: 
Case 1: Two encoders 

/, : 5r^{l,2,...,2"^i}, 

/: {1,2,..., 2"-^} x5i" X {l,2,...,2"^i}h^ A"", 



and a decoder 



Case 2: Two encoders 



y" xS^ X {l,2,...,2"^i}h^{l,2,...,2"-^}. (2) 



/„: 52" ^{1,2,..., 2"^-}, 

/: {1,2,..., 2"-^} x5i" X {1,2,...,2"^2}„>^ 



and a decoder 



Case 2c'- Two encoders 



and a decoder 



g: y x5^ X {l,2,...,2"-^2}^{i^2,...,2"-"}. (3) 

/,; : {1,2,..., 2"^} X 5i X {1, 2, . . . , 2"^-} ^ X,, 

g: y" x5^ X {l,2,...,2"-"2}^{i^2,...,2"-^}. (4) 



The average probability of error, Pe , for a (2"^,2"^j, n) code is defined as 



(5) 

-u; — 1 

where the index W is chosen according to a uniform distribution over the set {1,2,..., 2"^}. A rate pair (_R, R') 
is said to be achievable if there exists a sequence of (2"^, 2"^ , n) codes such that the average probability of error 

Pi'"'^ ^ as n ^> cx). 

Definition 3. The capacity of the channel, C{R'), is the supremum of all R such that the rate pair {R, R') is 
achievable. 

B. Definitions and problem formulation - source coding with side information 

Throughout this article we use the common definitions of rate-distortion as presented in [.27 i . 

Definition 4. The source sequence {Xi G X,i = 1,2,...}, the ESI sequence {5*1. ^ G S\, i = 1,2,...} and the 
DSI sequence {82.1 G S2,i — 1,2,...} are discrete random variables drawn from the finite alphabets X, Si and 
S2 respectively. The random variables {Xi, Si,i, 82.1) ^e i.i.d ^ p{x, si, 82)- Let X be the reconstruction alphabet 
and dx : X X X 1-^ [0, 00) be the distortion measure. The distortion between sequences is defined in the usual way: 



1 " 

T7 ^— ^ 



n ■. 

z— 1 



(6) 



Problem Formulation. For the source, X, the ESI, Si, and the DSI, S2, consider the following source coding 
problem cases: 

• Case 1: The encoder is informed with ESI and the decoder is informed with increased DSI. 

• Case 2: The encoder is informed with increased ESI and the decoder is informed with DSI. 

• Case Ic'- The encoder is informed with ESI and the decoder is informed with increased causal DSI (5*2 at 
time i). This case is the same as Case 1, except for the causal DSI. 

All cases are presented in Figure |3] 



Definition 5. A (n, 2"^,2"^j, Z?) code, {j g 1,2}, for the source X with increased partial side information, as 
illustrated in Figure |3] consists of two encoders, one decoder and a distortion constraint. The encoders are / and 
/„, where / is the encoder for the source and /„ is the encoder for the side information, and the decoder is g, as 
described for each case: 



Case 1: Two encoders 



and a decoder 



Case 2: Two encoders 



and a decoder 



Case Ic'- Two encoders 



/„: 5rH^{l,2,...,2"«i}, 

/ : A-" x5{' X {l,2,...,2"-"i}t->{l,2,...,2"-"}, 

g: {1,2,...,2"^} x5J X {1,2, ...,2"-^i}h-^i'". (7) 

/„: 52"^{1,2,...,2"«H, 

/: r' xSl' x{l,2,...,2"^^}^{l,2,...,2"^}, 

g: {l,2,...,2""}xS'2' x{l,2,...,2""'^}^X". (8) 

/„: 5rH^{l,2,...,2"«i}, 

/: X" xSl' X {1,2,...,2"-^i}h^{1,2,...,2"-"}, 



and a decoder 



gr- {l,2,...,2"«}x5^x{l,2,...,2"«i}^i',. (9) 

The distortion constraint for all three cases is: 

n 

~^ " ' < D. (10) 



E|-> d(X^,X^] 



i=l 

For a given distortion, D, and for any e > 0, the rate pair {R, R') is said to be achievable if there exists a 
(n, 2"^, 2"^ , I? + e) code for the rate-distortion problem. 



Definition 6. For a given R' and distortion D, the operational rate R*{R', D) is the infimum of all R, such that 
the rate pau" [R, R') is achievable. 



III. Coding with Increased Partial Side Information - Main Results 

In this section we present the main results of this paper We will first present the results for the channel coding 
cases, then the main results for the source coding cases and, finally, we will present the duality between them. 

A. Channel coding with side information 

For a channel with two-sided state information as presented in Figure |2] where (5*1.;, 5*2.1) ^ p(si,S2), the 
capacity is as follows 

Theorem 1 (The capacity for the cases in Figure |2). For the memoryless channel p(y|a;, si, S2), where Si is the 
ESI and 5*2 is the DSI and the side information (S'l^i, 82,1) ^ p{si, S2), the channel capacity is 

Case 1: The encoder is informed with ESI and the decoder is informed with increased DSI, 

Cr= max /(f/;y,S'2|yi)-/(t/;Si|Vi). (11) 

p(vi\si)p(u\si,vi)p(x\u,si,vi) 
s.t. R'>I(yi;Si)-I(Vi;Y,S2) 

Case 2: The encoder is informed with increased ESI and the decoder is informed with DSI; 
Lower bounded by 

C^'*= max I{U;Y,S2\V2)~I{U;Si\V2). (12) 

p{v2\s2)p{u\si,V2)p{x\u,s-i,V2) 
s.t. R'>I{V2;S2\Si) 

Upper bounded by 

C2"*= max I{U;Y,S2\V2)-I{U;S^\V2) (13) 

p{'^2\si,S2)p{u\si,V2)p{x\u,si,V2) 
s.t. R'>I{y2\S2\Si_) 

and by 

0^^"= max 7(f/;y,S'2|V^2)-/(f/;S'ill/2). (14) 

p{v2\s2)p{u\s-[^,S2,V2)p{x\u,s-i,V2) 
s.t. R'>I(V2;S2\Sl) 

Case 2c'- The encoder is informed with increased causal ESI (S\ at time i) and the decoder is informed with DSI, 

Clc= max 1(JJ;Y,S2\V2). (15) 

R'>I(V2;S2) 

For case j, j e {1,2}, some joint distribution, p{si,S2,Vj,u,x,y), and {U,Vj) being some auxiliary random 
variables with bounded cardinality. 



Section IBJ contains the proof. 

Lemma 1. For all three channel coding cases described in this section and for j e {1, 2}, the following statements 
hold 

(i) The function Cj{R') is a concave function of R' . 
[a) It is enough to take X to be a deterministic function of (C/, 5*1, V,) to evaluate Cj. 
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(iii) The auxiliary alphabets U and Vj satisfy 

for Case 1: |Vi| < lA-H^iH^al + 1 and 
\U\< \X\\Si\\S2\{\X\\Si\\S2\ + l), 

for Case 2: IV2I < |5i||52| + 1 and 
\K\ < \X\\Si\\S2\{\Si\\S2\ + l), 

for Case 2^: IV2I < |52| + 1 and 

|W|< 1^-11521(1521 + 1). 

Appendix |D] contains the proof for the above lemma. 

Remark: We assume that the lower bound of Case 2 is tight, namely, C2 = C^2- This claim is hard to corroborate; 
we have not, as yet, derived a converse proof that maintains both Markov relations V2 — 5*2 — 5*2 and C/ — (5i , V2 ) — 6*2 
and that bounds any achievable rate from above simultaneously. 

B. Source coding with side information 

For the problem of source coding with side information as presented in Figure [3] the rate-distortion function is 
as follows: 

Theorem 2 (The rate-distortion function for the cases in Figure |3). For a bounded distortion measure d{x, x), a 
source, X, and side information, 5*1,52, where (^i, 5i_i, 52,i) ~p(x, si,S2), the rate-distortion function is 

Case 1: The encoder is informed with ESI and the decoder is informed with increased DSI, 

R\{D)= min I{U;X,Si\Vi)-I{U;S2\Vi). (16) 

p(^v-i\si)p{u\x,si,v\)p(^x\u,S2,vi) 
s.t. R'>I(Vi;Si\S-2) 

Case Ic-' The encoder is informed with ESI and the decoder is informed with increased causal DSI (52 at time i), 

Rlc{D)= min I{U;X,Si\Vi). (17) 

p(vi\si)p(u\x,si,vi)p(x\u,S2,vi) 
s.t. R'>I(Vi;Si) 

Case 2: The encoder is informed with increased ESI and the decoder is informed with DSI, 

RUD)= min I{U;X, Si\V2) ~ I{U;S2\V2). (18) 

p(,t^2\^2)p(^\^iSi,V2)p(x\u,S2,V2) 
s.t. R'>I(V2;S2)-I(V2;X,Si) 

For case j, j e {1, 2}, some joint distribution, p{x, si, S2, Vj, u, x), where E - J27=i '^(^ii -^i) — ^ and {U, Vj) 
being some auxiliary random variables with bounded cardinality. 
Section ICJ contains the proof. 

Lemma 2. For all cases of rate-distortion problems in this section and for j e {1,2}, the following statements 
hold. 

(i) The function Rj{R' , D) is a convex function of R' and D. 
(ii) It is enough to take X to be a deterministic function of (C/, 5*2, V,) to evaluate Rj. 
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(iii) The auxiliary alphabets U and Vj satisfy 

for Case 1: 



|Vi| < |5i||52| + l and 
\U\<\X\\Si\\S2\{\Si\\S2\ + l), 



for Case Ic: |Vi|<|5i| + l and 

|Z^1< 1^-11511(1511 + 1), 



for Case 2: IV2I < lA-H^iH^al + 1 and 

\U\<\X\\S^\S2\{\X\\Si\\S2\ + l). 

Appendix |D] contains the proof for the above lemma. 

C. Main results - duality 

We now investigate the duality between the channel coding and the source coding for the cases in Figures |2] 
and|3] The following transformation makes the duality between the channel coding cases 1, 2, 2c and the source 
coding cases 2, 1, \c, respectively, evident. The left column corresponds to channel coding and the right column 
to source coding. For cases j and j, where j, j e {1, 2} and j ^ j, consider the transformation: 



channel coding 

C 

maximization 

X 
Y 

u 

R' 



source coding 

R{D) 

minimization 

R]{D) 

X 

X 



Sj 



u 

R'. 



(19) 
(20) 
(21) 
(22) 
(23) 
(24) 
(25) 
(26) 
(27) 
(28) 



This transformation is an extension of the transformation provided in [l6) and in flS). Note that while the channel 
capacity formula in Case j and the rate-distortion function in Case j are dual to one another in the sense of 
maximization-minimization, the corresponding rates R' are not dual to each other in this sense; i.e., one would 
expect to see an opposite inequality (> <-> <) for dual cases, where we have an inequality that is in the same 
direction (< o <) in the R' formulas. The duality in the side information rates, R', is then in the sense that the 
arguments in the formulas for the dual R' are dual. This exception is due to the fact that while the Gelfand-Pinsker 
and the Wyner-Ziv problems for the main channel or the main rate-distortion problems are dual, the Wyner-Ziv 
problem for the side information stays the same; the only difference is the input and the output. 
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IV. Geometric Programming 

In this section, we provide a method to evaluate the Wyner-Ziv rate, using the Lagrange dual function and 
geometric programming. Before presenting the main results on this subject, let us provide the definitions and 
notations that we will use throughout this section and throughout the proof of the forthcoming main results. 

A. Definitions and preliminaries - convex optimization and Lagrange duality 

Most of the notations and the definitions that we use in this section are taken from [28]. We denote the variable 
X with dimension greater than 1 as x and we use x ^^ to denote that x^ > for alH = 1, 2, . . . , dim(x). 
Consider the following optimization problem: 

minimize /o(x) 

subject to /i(x) < 0, i = 1, 2, . . . , m, (29) 

/ij(x)=0, j = l,2,...,p, 

with the variable x e K." . We refer to /o as the objective function of the optimization problem and to fi and hj 
as the constraint functions . We let T) denote the domain of x; this is the set of all points for which the objective 
and the constraint functions are defined. We denote the optimal minimizer of /o(x) in V as x*. If the objective 
function, /o(x), and the inequality constraint functions, fi{'x.), i = 1, 2, . . . , m, are all convex in x and the equality 
constraint functions, hj{x.), j ~ 1,2, ... ,p, are affine in x, then the problem is said to be a convex optimization 
problem. The Lagrangian associated with problem ( |29] l is 



L(x, A,/x) = /o(x) +^ AJ,(x) +^^j/i,(x), (30) 

where x e 2?, A G M™ and //, G R^. The Lagrange dual function, as defined in [28, Capter 5.1.2], is 

5(A,/i) = inf L(x,A,/x). (31) 

Following from ll28l Chapter 5.1.3], for any A where A; > for i — 1,2, . . . ,m, the Lagrange dual function yields 
a lower bound on the optimal value, /o(x*). The Lagrange dual problem ll28l Chapter 5.2] associated with (|29] l is 

maximize n(\, a) 

subject to Ai > 0, i — 1,2, . . . ,m. 



In this context, we refer to the original problem (129) as the primal problem. The strong duality property is associated 
with the case where the solution for the dual problem and the solution for the primal problem coincide. Following 
from [i28i Chapter 5.2.3], if the primal problem is convex and Slater's condition ll28l Chapter 5.2.3] holds, then 
strong duality holds. 

A special family of optimization problems that we are interested in is the family of geometric programs. This 
type of optimization problems is defined in ll28l Chapter 4.5] and is summarized here. Define monomial as the 
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function 



/(x) =0X1 i^a^ 



■ *^^ 



were c > and a^ G M. A sum of monomials, i.e., a function of the form 

K 



/(x) = ^c,<"a:r 



■ •*^r. 



(33) 



(34) 



k=l 



where Ck > 0, is called a posynomial. An optimization problem of the form 



minimize /q (x) 
subject to /j(x) < 1, 



1,2, 



. ,m, 



(35) 



/ij(x) = l, j = l,2,...,_p, 



where /o, • ■ • , /m are posynomials, hi, . . . ,hp are monomials and x ^ is called a geometric program. Geometric 
programs, as mentioned in ||28] Chapter 4.5], are not convex problems. However, these problems can be transformed 
into convex optimization problems by taking log(-) on both the objective and the constraint functions. 



B. Problem Setting and Main Results 

Let us consider the classic Wyner-Ziv problem as illustrated in Figure |4] Assume correlated random variables 
{X, S) ^ i.i.d. p{x, s) with finite alphabets X, S, respectively. Let | {Xi, Si)} . be a sequence of n independent 
drawings of {X,S). Let the sequence X" be the source sequence and let S" be the side information sequence 
available at the decoder We wish to describe the source, X, at rate R bits per symbol and to reconstruct X at the 
decoder with a distortion smaller than or equal to D, i.e., when encoding X in blocks of length n, we desire that 



E 



^Etid{X,,X.) 



< D. 



X 













* Encoder 






X 











Fig. 4: The Wyner-Ziv problem. 



The rate-distortion function with side information at the decoder |[T] is 



R{D)^ min IiU;X\S) 

p{u\x)p{x\u,s) 



(36) 



for some joint distribution p{x, s, u, x) such that E 



d{X,X] 



< D, i.e., ^^ „^p(x,s)p(u|2;)p(£|u,s)d(a;,i) < 



D. According to ||20l , we can write the expression of the rate-distortion function as 



RiD)= min/(T;X|5) 

qit\x) 



(37) 



14 



for some joint distribution p{x, s, t) — p{x, s)q{t\x), where T is the set of all mappings 

t: S^X, (38) 

and the distortion constraint 

Y^ p{x, s)q{t\x)d{x, t{s)) < D (39) 

is maintained. We denote the set of g(i|x)'s for al\ x G X and t G T as q G rI^H'^I and we note that /(T; X\S) 
is a convex function of q and that the rate-distortion function, R{D), is its optimal value. 

Combining (|37| | and (|39] |. we get that the Wyner-Ziv problem is the following problem 

minimize Ex,s,tP(a;, s)<7(i|a;) log ^^ 

subject to V, Q(t\x) = 1 Vx, 

^* (40) 

Y.x,s,tP^x^^)<liAx)d{x,t{s)) <D, 

q{t\x)>0 yx,t, 

where the variables of the optimization are q and the constant parameters are the source distribution, p{x, s), the 
distortion measure, d(^x,t{s)^, and the distortion constraint, D, for all x G X, s e S and t G T- The marginal 
distribution Q{t\s) is defined by 

«"" - ^&^^ 

We define the set of Q(t|s)'s for all s G 5 and i G T as Q G RI'^H'^I. 
The main result of this section is brought in the following theorem. 

Theorem 3. The Lagrange dual of the Wyner-Ziv rate-distortion problem is the following geometric program (in 
convex form): 



maximize J^x P{^)'^x ^ I'D 



subject to a^ + J2sP(^\^) 



\ogp{x\s) - 7d(a;, t{s)) - y:r,s,t 



log(Ea:exp{y:r,s,t}) < Vs,i, 
7>0, 



<0 \/x,t, 

(42) 



where the optimization variables are a G RI'^1,7 G M+ and y G rI'^H'^II'^I, and the constant parameters are 
the source distribution p{x,s), the distortion measure d(^x,t{s)^ and the distortion constraint, D. Furthermore, if 
Slater's condition ll28l Chapter 5.2.3] holds, then strong duality holds and the solution for the optimization problem 
in (|42] | is a tight lower bound on the Wyner-Ziv solution, (|40] |. and R{D) is its optimal value. 

Proof: The proof for Theorem |3] is given in Appendix |E] 
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V. Examples 

In this section we provide examples for Case 2 of the channel coding theorem and for Case 1 of the source 
coding theorem. The numerical iterative algorithm, which we used to numerically calculate the lower bound, C!^, 
is provided in the next section. 

Example 1 {Case 2 channel coding for a binary channel). Consider the binary channel illustrated in Figure |5] The 
alphabet of the input, the output and the two states is binary A' = ^ = iSi = 52 = {0, 1} with (5*1, 52) ^ PsiS2 
being a joint PMF matrix. The channel is dependent on the states Si and 6*2, where the encoder is fully informed 
with Si and with 5*2 with a rate limited to R' and the decoder is fully informed with ^2. The dependence of 
the channel on the states is illustrated in Figure |5] If (5*1 = 1,^2 = 0) then the channel is the Z channel with 
transition probability e, if {Si — 1,82 — 1) then the channel has no error, if (5i —Q,S2 — 0) then the channel is 
the X-channel and if {Si —Q,S2 — 1) then the channel is the S-channel with transition probability of e. The side 
information's joint pmf is 



S1S2 



The expressions for the lower bound on the capacity C^2{R') ^"d for R' are brought in Case 2 of Theorem [T] 




M. 



ST 



Encoder 



X" 



R' 



Channel 



y-" 



Decoder 



^ M 



{Si,S2) (1,0) 

. 7O 

The 
Channel 

1 ^ . 1 1 ^ 




Fig. 5: Example 1 Channel coding Case 2 - channel topology. 

In Figure |6] we provide the graph from of the computation of the lower bound on the capacity for the binary 
channel we are testing. In the graph, we present the lower bound, C!^{R'), as a function of R'. We also provide 
the Cover & Chiang f&\ capacity (where R' = 0) and the Gelfand & Pinsker ||3] capacity (where R' = and the 
decoder is not informed with 5*2). 

Discussion: 

1) The algorithm that we used to calculate C!^{R') and R' combines a grid-search and a Blahut-Arimoto-Uke 
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Fig. 6: Example 1. Channel coding Case 2 for the channel depicted in Figure |5] where the side information is distributed 
Si ^ Bemoulli(0.5), and Pr{S'2 7^ 5*1} = 0.8. C!i'(R') is the lower bound on the capacity of this channel, C-C rate is the 
Cover-Chiang rate (R' = 0) and G-P rate is the Gelfand-Pinsker rate {R' = and the decoder has no side information available 
at all). Notice that at the encoder the maximal uncertainty about 52 is H(S2\Si) = 0.7219 bit. Therefore, for any R' > 0.7219 



algorithms. We first construct a grid of probabilities of the random variable V2 given S2, namely, w{v2\s2)- 
Then, for every probability w{v2\s2) such that /(V2; S'2|S'i) is close enough to R' we calculate the maximum 
of /([/; Y, 5*21^2) — /([/; >S'i|V2) using the iterative algorithm described in the next section. We then choose the 
maximum over those maximums and declare it to be C!^. By taking a fine grid of the probabilities w{v2\s2) 
the operation's result can be arbitrarily close to C!^. 

2) For a given joint PMF matrix PsiS2' we can see that C!^{R') is non-decreasing in R'. Furthermore, since 
the expression /(V2; S'2|<S'i) is bounded by i?inax = niaxp(„2|s2) -^(^2; <S'2|<S'i) = H{S2\Si), allowing R' to be 
greater than i?max cannot improve C2'' any more, i.e., C!^{R' — i?max) = Ci^'i^' > -Rmax)- Therefore, it is 
enough to allow R' = i?max to achieve C!^, as if the encoder is fully informed with 82- 

3) Although C2'' is a lower bound on the capacity, it can be significantly greater than the Cover-Chiang and the 
Gelfand-Pinsker rates for some channel models, as can be seen in this example. Moreover, we can actually 
state that Ci^ is always greater than or equal to the Gelfand-Pinsker and the Cover-Chiang rates. This is 
due to the fact that when R' = 0, Cj'' coincides with the Cover-Chiang rate, which, in its turn, is always 
greater than or equal to the Gelfand-Pinsker rate; since C2'' is also non-decreasing in R', it is obvious that 
our assertion holds. 

Example 2 (Source coding Case 1 for a binary-symmetric source and Hamming distortion). Consider the source 
X ^ Si ® S2, where 81,82 ^ i-i-d. Bernoulli(0.5), and consider the problem setting depicted in Case 1 of the 
source coding problems. It is sufficient for the decoder to reconstruct 81 with distortion ]E[d(S'i, ^i)] < Z? in 
order to reconstruct X with the same distortion. Furthermore, the two rate-distortion problem settings illustrated in 
Figure Hare equivalent. 
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R + R' 
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Si 



X 



Si 

I" 
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-.1 


Enc 
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X 



Fig. 7: The equivalent rate-distortion problem for Case 1 for the source X = Si (B S2 where Si, S2 ~ i-i-d. Bemoulli(0.5). 



For every achievable rate in Setting 1, E 



diSi,Si) < D. Denote X ^ Si ® S2, then, d{Si,Si) = Si ® Si 



{Si ® S2) ®iSi®S2)^X®X^ d{X, X) and, therefore, E d{Si,Si) 



<D in Setting 1 



E 



d{X,X) 



< D 



Setting 2. In the same way, for Setting 2, denote Si = X © S'2. Then, d{X,X) = X ® X ^ Si® Si and. 



therefore, E 



d{X,X) 



< L» in Setting 2 



E 



d{Si,Si) 



< D in Setting 1. Hence, we can conclude that the two 



settings are equivalent and, for any given < D and < R' , the rate-distortion function is 

R{D) = 



(43) 



1 - H{D) - R' 1 - H{D) - R' >0 
1 - H{D) - i?' < 

In Figure H] we present the plot resulting for this example. It is easy to verify that the Wyner & Ziv rate and the 
Cover & Chiang rate for this setting are Rwz{D) — Rcc{D) — max {l — H{D), O}. 




0.2 0.25 0.3 0.35 

D 



Fig. 8: Example 2. Source coding Case 1 for binary-symmetric source and Hamming distortion. The source is given by X = 
Si ® S2, where Si, & ~ Bernoulli(0.5). The graph shows the rate-distortion function for different values of R'. 



Example 3 (Geometric programming and the Wyner-Ziv problem). Consider the traditional Wyner-Ziv [1] problem 
where the source, X, and the side information, S, are distributed according to X ~ Bernoulli(0.5) and Prj^ 7^ 
X} = 0.3. We calculated the rate-distortion function, R{D) = minp(„|j.)p(j|„ s) /(C/;X|S') s.t. E d{X,X) < D , 
by using three different methods: first by using JT] Theorem II], second by using ll25l Proposition 3] and third by 
using the geometric programming solution we introduced in Theorem [3] The plot resulting from this computation 
is brought in Figure |9] 
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Fig. 9: Example 3. Geometric programming and Wyner-Ziv. The source and the side information distribute X ~ Bernoulli(0.5) 
and Pr{S / X} = 0.3. 



It can be seen in the figure that the geometric program, which was calculated according to Theorem |3] is tight 
to the Wyner-Ziv rate. 



5, 



Si 




e^ 



X 



Zi 
Fig. 10: Example 4. Source coding Case 1 with binary symmetric source generation, as given in (144) 



Example 4 (Geometric programming and source coding Case 1). Again, consider a rate-distortion problem as 
outlined in Case 1 with a binary-symmetric source and Hamming distortion. The source, X, is the output of the 
system illustrated in Figure [TOl 81,82 ^ i-i-d. Bemoulli(0.5), 82 is controlling a switch, Zq ^ Bernoulli (0.3) and 
Zi ^ Bernoulli (0.001). The output of this system can be expressed as 



X = 



8i®Zq, 82^0 
8i®Zi, 82 = 1 



(44) 
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This source coding problem was introduced by Cheng, Stankovic and Xiong 11221 for the case where the 
users are not allowed to share with each other their partial side information (R' = 0). The rate-distortion 
expression for this problem is Ri{D) = inmI{U;X,Si\Vi) — I{U; S2\Vi), where the minimization is over all 
p{vi\si)p{u\x,si,vi)p{x\u,S2,vi) s.t. R' > /(Vi;5i|S'2) and that E ^J2i=i(^i^i:^i) — D. We solve this 
example by using the geometric programming expression we developed in Theorem |3] The algorithm we developed 
in order to solve this problem uses some of the main principles we used in the algorithm that we developed for 
Example [T] (Algorithm 1) and that is detailed in Section |VT] For this reason, we now bring a summary of the 
algorithm for this example. 

First, as claimed in Section |IV] it is possible to write the expression for the rate-distortion as R{D) = 
min/(T;X, S'llVi) -/(r;S'2|Vi) where the minimization is over all w{vi\si)q{t\x,si,vi) s.t. R' > /(Vi;5i|S'2) 
and that E i X^Li d{Xi,T{S2, Vi)) < D. The variable T is the mapping T : ^2 x Vi ^ i". It can be verified 
that for every fixed probability, w{vi\si), the function I{T;X,Si\Vi) — I{T;S2\Vi) is a convex function of 
q{t\x,si,vi). Now, we construct a fine grid of probabilities w{vi\si), and we keep those w{vi\si) for which 
R' > I{Vi;Si\S2) > R' -ein the array W*. At this point, for every w{vi\si) £ W* that we kept, we let Rw{D) 
be the solution for the following geometric program 



maximize Y.x,si,vt cix,st,vtP{x,si,vi) -jD 



< 0, \fx, si,vi,t, 



subject to a:r,si,Di +J2s2P(^'^\^^^'^) ^ogp{x,si\s2,vi) - jd{x,t{s2,vi)) - yx,si,s2.vi,t 
log(Ex,siexp{yx,si,s2,^'i,,t}) < 0> Vs2,wi,i, 
7>0, 

(45) 

where the variables of the maximization are a G rI'^H'SiIIViI -y ^ r and y e Rl'^ll'Si|l<S2||Vi||r|^ jj ^.^n be 
verified that this geometric program is a generalization of the geometric program we developed in Theorem |3] 
and that it corresponds to the problem of minimizing I{T;X,Si\Vi) — I{T; S2\Vi) over q{t\x,si,vi) s.t. 
E - X]"=i d{Xi, Xi) < D (for a fixed probability w{vi\si)). Therefore, all we are left to do now is to declare 

R{D) = min R^oiD). (46) 

w(vi\si)eW'' 

This concludes the summary of the algorithm for solving this example. 

The numeric result of the calculation of this rate-distortion function is brought in Figure [TT] 

VI. Semi-Iterative Algorithm 

In this section we provide algorithms that numerically calculate the lower bound on the capacity of Case 2 of the 
channel coding problems. The calculation of the Gelfand-Pinsker and the Wyner-Ziv problems has been addressed 
in many papers in the past, including ||5|, ll20l . 11211 and ll22l . All these algorithms are based on Arimoto's llT9l and 
Blahut's ifTSl algorithms and on the fact that the Wyner-Ziv and the Gelfand-Pinsker problems can be presented as 
convex optimization problems. On the contrary, our problems are not convex in all of their optimization variables 
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Fig. 11: Example 4. Geometric programming and source coding Case 1. The source X is depicted in FigurefTOland the distortion 
is the Hamming distortion. 



and, therefore, cannot be presented as convex optimization problems. In order to solve our problems we devised a 
different approach which combines a grid-search and a Blauhut-Arimoto-like algorithm. In this section, we provide 
the mathematical justification for those two algorithms. Other algorithms to numerically compute the channel 
capacity or the rate-distortion of the rest of the cases presented in this paper can be derived using the principles 
that we describe in this section. 

A. An algorithm for computing the lower bound on the capacity of Case 2 

5r s^ 









_,~ 


R! _ 


-'-'' 








w 




X" 


Channel 


yn 




w 



















Fig. 12: Channel coding: Case 2. C^-^ — max/([/; y, 5'2|V2) — I(lJ\S\\y2), where the maximization is over all PMFs 
'w(y-i\a-i)'p(u\s\,Vi}'p(x\s\,vi,u) such that R! > /(V2; ^l^i). 

Consider the channel in Figure [T2] described by p{y\x, si, S2) and consider the joint PMF p{si, S2). The capacity 
of this channel is lower bounded by max J(f/; Y, 5*211^2) — I{U; Si\V2), where the maximization is over all PMFs 
p{si,S2)w{v2\s2)p{u\si,V2)p{x\si,V2,u)p[y\x, 81,32) such that R' > I{V2;S2\Si). Notice that the lower bound 
expression is not concave in w{v2\s2), which is the main difficulty with the computation of it. We first present an 
outline of the semi-iterative algorithm we developed, then we present the mathematical background and justification 
for the algorithm and, finally, we present the detailed algorithm. 

For any fixed PMF w{v2\s2) denote 



Rw — I{V2; S2\Si), 



rtlb A 



max I{U;Y,S2\V2)-I{U]Si\V2). 

p(u\si,V2)p{x\u,Si,V2) 
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(47) 
(48) 



Then, the lower bound on the capacity , C2^{R'), can be expressed as 

Cl^{R')^ max max [I{U;Y,S2\V2) - I{U;Si\V2)] ^ max Cl^^. (49) 

W(V2\S2) p(u\si,V2)p(x\u,Si,V2) w{v2\s2) ' 

s.t. R'>R^ s.t. R'>R^ 

The outline of the algorithm is as follows: for any given rate R' < H{S2\Si), e > and S > 0, 

1) Establish a fine and uniformly spaced grid of legal PMFs, w{v2\s2), and denote the set of all of those PMFs 
as W. 

2) Establish the set W* := |u;(t;2|s2) \w{v2\s2) eW and R' - e < R^, < R'\. This set is the set of all PMFs 
w{v2\s2) such that i?^ is e-close to R' from below. If W* is empty, go back to step 1 and make the grid 
finer Otherwise, continue. 

3) For every w{v2\s2) G W*, perform a Blahut-Arimoto-like optimization to find C!^^^ with accuracy of 5. 

4) Declare C'i{R') = max^(,,|,,)ew* C'^'^'^'-'^\R'). 

Remarks: (a) We considered only those R's such that R' < H{S2\Si) since H{S2\Si) is the maximal value that 
/(V2; S'2|<5'i) takes. The interpretation of this is that if the encoder is informed with Si, we cannot increase its side 
information about 5*2 in more than H{S2\Si). Therefore, for any H{S2\Si) < R', we can limit R' to be equal 
to H{S2\Si) in order to compute the capacity, (b) Since C^2w{R') is continuous in w{v2\s2) and bounded (for 
example, by I{X;Y\Si^S2) from above and by I{X;Y) from below), C2 ' (-R') can be arbitrarily close to 
C^''iR') for e ^ 0, 6^0 and \W\ -^ 00. 

Mathematical background and justification 

Here we focus on finding the lower bound on the capacity of the channel for a fixed distribution 'w{v2\s2), 
i.e., finding Ci^'^. Note that the mutual information expression I{U;Y,S2\V2) — I{U; Si\V2) is concave in 
p{u\si,V2) and convex in p{x\u,si,V2)- Therefore, a standard convex maximization technique is not applicable 
for this problem. However, according to Dupuis, Yu and Willems |21|, we can write the expression for the lower 
bound as C!^^ = maxqi^i^g^y^-f I{T;Y, S2\V2) — /(r;S'i|V2), where q{t\si,V2) is a probability distribution over 
the set of all possible strategies t : Si x V2 ^ X, the input symbol X is selected using x ~ t{si,V2) and 
p{y\x,si,S2) ^ p{y\x,si,S2,V2) = p{y\t{si,V2),si,S2,V2). Now, since I{T;Y, S2\V2) - I{T; Si\V2) is concave 
in q{t\si,V2), we can use convex optimization methods to derive C^^. 

Denote the PMF 

p{si,S2,V2,t,y) =p{si,S2)w{v2\s2)q{t\si,V2)p{y\t,Si,S2,V2), (50) 

and denote also 

Jw{q,Q)= > p{si,S2,V2,t,y)log — — — , (51) 

, q[t\si,V2) 

Sl;S2-V2,t.,y i\ \ J J 

Q (ty,S2,V2) = :^ -. r. (52) 

Notice that Q*{t\y,S2,V2) is a mai-ginal distribution of p{si,S2,V2,t,y) and that Jw{q,Q*) = I{T;Y,S2\V2) — 
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I{T; Si\V2) for the joint PMF p(si, sz, V2, t, y). 

The following lemma is the key for the iterative algorithm. 

Leiiuna 3. 

&2w= sup max Jyj{q',Q'). (53) 

The proof for this is brought by Yeung in [29l. In addition, Yeung shows that the two-step alternating optimization 
procedure converges monotonically to the global optimum if the optimization function is concave. Hence, if we 
show that Jw{q, Q) is concave, we can maximize it using an alternating maximization algorithm over q and Q. 

Lemma 4. The function Jw{q,Q) is concave in q and Q simultaneously. 

We can now proceed to calculate the steps in the iterative algorithm. 

Lemma 5. For a fixed q, Jw{q, Q) is maximized for Q — Q*. 

Proof: The above follows from the fact that Q* is a marginal distribution of p{si,S2)w{v2\s2)q{t\si,V2) 
p{y\t,si,S2,V2) and the property of the K-L divergence D{Q*\\Q') > 0. ■ 

Lemma 6. For a fixed Q, Jw{q, Q) is maximized for q ^ q*, where q* is defined by 

n.,., Q{t\y, S2, v2y(^^-\^^---^p(y\''^^'^^'--^ 

q {t\Si,V2) J2t'Ils,.yQit\y,S2,V2)P(^-\^^'--Mv\t',suS2,v.)^ ^^4) 



and 



/ I N Pisi,S2)w{v2\s2) 

Define Uw{q) in the following way 

Uw{q)^ V" p(si,W2)maxV'p(s2|si,W2)p(2/|i,si,S2,W2)log ' ^\^ , (56) 

^^ * ^^ q[t\si,V2) 

where Q* is given in (|52l), p(si, W2) andp(s2|si, W2) are marginal distributions of the joint PMF p(si, S2,^'2,^, J/) = 
p{si, S2)w{v2\s2)q{t\si^V2)p{y\tT si, S2,V2)- The following lemma will help us to define a termination condition 
for the algorithm. 

Lemma 7. For every g(i|si, W2) the function Uw{q) is an upper bound on C^2 ^"'^ converges to C^2w foi" ^ large 
enough number of iterations. 

B. Semi-iterative algorithm 

The the algorithm for finding C!^{R') is brought in Algorithm 1. Notice that the result of this algorithm, 

C,^'''''^'(i?')> can be arbitrarily close to C^''(i?') for e ^ 0, 5 ^ and |W| -^ 00. 
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Algorithm 1 Numerically calculating C2 (R 



1: Chose e> 0, S > 

2: Set R' <— niin{i?',_ff(52|5'i)} i> the amount of information needed for the encoder to know ^2 given Si 

3: Set C < 00 

4: Establish a fine and uniformly spaced grid of legal PMFs w(w2|s2) and name it W 
5: for all w in W do 
6: Compute R^ using 

Rn.^IiV2;S2)-IiV2;Si) 

1: if R' -e<R^ < R' then 

8: Set Q{t\y^ 52,^2) to be a uniform distribution over {1, 2, . . . , |T|}, where T is the alphabet of t. 

i.e., Q{t\y, S2, W2) = ^, Vt, y, S2,V2 
9: repeat 

10: Set q{t\si,V2) ^ q*{t\si,V2) using 

11: Set (Q(t|y, S2,W2) ^ Q*(%, 52,^2) using 

Q iy,s2,w2 = ^F^ -, -n-^ 

T,sut'Pi-'^'^^^2,V2,t',y) 

12: Compute Ju,(<7,(5) using 

7 / ^^ v^ / ^ M Q{t\y, S2,V2) 







13: 


Compute Uw{q) using 




^«'('?) = V! p(si,W2)maxVp(s2|si,W2)p(y|i,si,S2,W2)log .^1 ' ^\^ 

Sl,t>2 S2,y i\ 1 1 y 


14: 


untilC/^((z)- J((Z,Q)<(5 


15: 


if C< J^((Z,Q) then 


16: 


SetC^ J^(g,Q) 


17: 


end if 


18: 


end if 



19: end for 

20: if C < then t> there is no PMF w{v2\s2) G W such that R^ is e-close to R' from below 

21: go to line 4 and make the grid finer 

22: end if 

23: Declare cf^^'''^^(i?')=C7 



24 



VII. Open Problems 

In this section we discuss the generalization of the channel capacity and the rate-distortion problems that we 
presented in Section |III1 We now consider the cases where the encoder and the decoder are informed with both 
a rate-limited description of the ESI and a rate-limited description of the DSI simultaneously, as illustrated in 
Figure [T3] Although proofs for the converses are not provided in this paper and are considered as open problems, 
we do provide achievability schemes for both problems. 

A. A lower bound on the capacity of a channel with two-sided increased partial side information 
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Fig. 13: A lower bound on the capacity of a channel with two-sided increased partial side infor- 
mation: Ci2 > max J((7; F, 5*21^1, V2) — /(f/; SilVi, V2), where the maximization is over all PMFs 

p{vx\sx)p(v2\s2)p{u\sx,v-i,V2)p(x\u,sx,v-i,V2) such that _Ri > I{Vi\Si) - I{Vi\Y,S2,V2) and R'2 > 
I{y2;S2)~I(V2;Si,V^). 

Consider the channel illustrated in Figure [T3l where {Si^i,S2.i) iid. ^ p(si,S2). The encoder is informed 
with the ESI (5") and rate-limited DSI and the decoder is informed with the DSI (5J) and rate-limited ESI. An 
(n, 2"^, 2"^i, 2"^2) code for the discussed channel consists of three encoding maps: 

/„i: 5r->{l,2,...,2"«'i}, 
/„2: 5^' H^ {1,2,..., 2"^^}, 
/: {1,2,..., 2"^} x5{' X {l,2,...,2"-^2}^;f"^ 

g-.y^ xS^ X {l,2,...,2"-^i}h^{l,2,...,2"^}. 



and a decoding map; 



Fact 1: The channel capacity, C^2, of this channel coding setup is bounded from below as follows: 



C* > 



S.t. R\>I(Vi;Si)-I{Vi;Y,S2,V2) 
R'2>IiV2;S2)-I{V2;Si) 



IiU;Y,S2\Vi,V2) - HU;Si\Vi,V2), 



(57) 



for some joint distribution p{si, S2, fi, f2, u, x, y) and U, V\ and V2 are some auxiliary random variables. 

The proof for the achievability follows closely the proofs given in Appendix |B] and, therefore, we only provide 
the outline of the achievability. The main steps of the achievability scheme are outlined in the following. 

Sketch of proof of Achievability for Fact 1: (a) The ESI encoder wants to describe S" to the decoder 
with rate of R[. We generate 2^'-^'-^^'^^'~^'^^ sequences V" i-i-d. '^ pivi) and randomly distribute them into 
2n{I{v^■,s^)-Iiv^■Y,S2,V2)+2e) ^j^^. ^^^j^ ^^^ contains 2'^^^^^^'^^'^^^^^'>'''> codewords. The ESI encoder is given the 
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sequence s" and first looks for a sequence w" that is jointly typical with s". If there is such a codeword, the ESI 
encoder sends the index of the bin that contains v^ to the decoder The decoder, given y", S2 , wj ' looks for a unique 
codeword in the received bin that is jointly typical with y"-, S2, wj ■ Since there are more than 2"^(^i''^i) sequences 
Vi\ the ESI encoder is assured with high probability to find a sequence u" such that (i;", s") G Te {Vi, Si). Since, 
in addition, there are less than 2"^(^i'^'^2'^2) codewords in the bin, the decoder is assured to find a unique sequence 
Vi in the bin such that (u", y", Sj j''^2 ) ^ % {Vi,Y, S2,V2) with high probability. Therefore, the constraint on 
the shared ESI is maintained if R'^ > I{Vi;Si) - I{Vi;Y, 5*2, ^2)- 

(b) The DSI encoder wants to describe 5^ to the channel's encoder with a rate of R'2. We generate 2"'-^'-^^''^^-'+'^-' 
sequences V2 ^ i-i-d. p{v2) and randomly distribute them into 2"v ^ 2' '^>~ ^ 2- i' ^>^ '^1 bins; each bin contains 
2n.{i{V2\Si,Vi)-<i) codewords. The DSI encoder, given S2, first looks for a sequence uj that is jointly typical with Sj . 
If there is such a codeword, the DSI encoder sends the index of the bin where Wj is located to the channel's encoder 
The channel's encoder, given s", u", looks for a unique sequence wj i" the received bin that is jointly typical with 
s",w". Since there are more than 2"^'^^2'^2^ sequences V2", the DSI encoder is assured with high probability to 
find such a sequence vj such that {v2,S2) £ 7^ (V2,52). In its turn, the channel's encoder is also assured with 
high probability to find the unique sequence V2 in its received bin such that (uj i s", u") £ 7^ (V2, Si, Vi), since 
there are less than 2"^(^2;S'i,V'i) codewords V2 in the bin. Therefore, the constraint of the shared DSI is maintained 
if R'2> I{V2;S2) - I{V2;Si,Vi). 

(c) The encoder wants to send the message W to the decoder. For each u",W2 we generate 2"(^('^'^^'^2|V'i,V2)-e) 
sequences [/" using the PMF p(u"|w",W2) — Y\!i=iP{'^i\''^i,iT'^2,i) and randomly distribute them into 
2n{I{u,YS^^v^y2)-I{u,s^\VuV^)-2e) ^j^^^. ^^^j^ ^j^^ contains 2"(-f('^^^il^i'^2)+^) codewords. The encoder, given 

s", u", V2 and the message W, looks in the bin number W for a sequence w" that is jointly typical with s", w" , W2 
and sends Xi = /(ui, si_i, wi^i, W2.i) over the channel at time i. The decoder receives y", Sj 7 ^": ^2 ^i^d first looks 
for a unique sequence w" that is jointly typical with y", Sj i'^": ^2 ■ Upon finding the desired sequence u", the 
decoder declares W to be the index of the bin that contains u". Having less than 2"^('^'^''^2|Vi,V2) sequences C/" 
assures with high probability that decoder will identify a unique sequence u" such that {u",y",S2,Vi,V2) G 
Te {U,Y, S2,vi,V2). This is also valid because the Markov relation {U,Vi,V2) — {X,Si,S2) — Y implies 
that (w", w", ^2 1 2^", s" : Sj ,2/") e Te {U,Vi,V2, X, Si, S2,Y). In addition, since in each of the encoder's bins 
there are more than 2"^'^'^'^il^i'^^' codewords t/", the encoder is assured with high probability to find a 
sequence u" in the bin indexed W such that (w", s", u", wj ) ^ T {U, Si,vi,V2). We can conclude that if 
R < I{U;Y,S2\Vi,V2) — I{U;Si\Vi,V2) is maintained, then a reliable communication over the channel is 
achievable; namely, it is possible to find a sequence of codes such the Pr{VK ^ W} goes to zero as the block 
length goes to infinity. This concludes the sketch of the achievability. 

B. An upper bound on the rate -distortion with two-sided increased partial side information 

Consider the rate-distortion problem illustrated in Figure [14] where the source X and the side information Si , S2 
are distributed {Xi, Si^i, S2,i) ^ i.i.d. p{x, 81,82). The encoder is informed with the ESI (S*") and rate-limited 
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Fig. 14: An upper bound on the rate-distortion with two-sided increased partial side information: 
Ri2(D) < min/(;7;X, 511^1,^2) - /(f/; S2IV1, V2), where the minimization is over all PMFs 

pivi\si)p(v2\s2)piu\x,Sl,Vi,V2)p{x\u,S2,Vi,V2)sUChthlitR[>I{Vl-,Sl)-I{Vl-,S2),R;2>I(V2-,S2)~I{V2-,X,Sl,Vl) 

andE[iEr=irf(^,^)] <D. 



DSI and the decoder is informed with the DSI (5J) and rate-limited ESI. An (n, 2"^, 2"^i , 2"^^ Z?) code for the 
discussed rate-distortion problem consists of three encoding maps: 



and a decoding map: 



U: 5i"^{l,2,...,2"«i}, 
f,2. 5^'^{1,2,...,2"«^}, 
/: A-" x5j" X {l,2,...,2"-^^}^{i^2,...,2"-"}, 



5 : {1, 2, ... , 2"^'} X S'2' X {1, 2, ... , 2"''i} ^ <Y" 



Fact 2: For a given distortion, D, and a given distortion measure, d{X, X) 
function R\2{D) of this setup is bounded from above as follows: 



X xX ^ 



\ the rate-distortion 



RUD)< min /([/; X, 5i|l-i, F2) - /(C/; ^2|I4, 1^2), 

p(tIl|si)p(ll2|S2)p(«|2:,Si,lli,t)2)p(a;|«,S2,l'l,f2) 

S.t. R\>I{Vi;Si)-I(Vi;Si,V2) 
R'2>I(V2\S-i)-I(V2\X,Si,Vi) 



(58) 



for some joint distribution p{x,si,S2,vi,V2tU,x) where E ^ X]"=i "^(^j'^j) — ^ ^"^ C/, Vi and V2 are some 
auxiliary random variables. 

The achievability proof is outUned in the following. The steps of the proof resemble the steps of the achievability 
proof for Fact 1 . 

Sketch of proof of Achievability for Fact 2: (a) The ESI encoder wants to describe S*" to the decoder with 
a rate of R'l- We generate 2"(^(^i''^i)+'^' sequences F" i.i.d. ^ p(wi) and randomly distribute them into 
2n{i{VuS,)~i(v,;S2.y2}+2e) ^^^^. ^^^^ ^^^ contains 2"('^(^i^^2'^=)-") codewords. The ESI encoder is given the 
sequence s" and first looks for a sequence w" that is jointly typical with s". If there is such a codeword, the ESI 
encoder sends the index of the bin that contains u" to the decoder. The decoder, given S2;^2' looks for a unique 
codeword in the received bin that is jointly typical with S2 , ^2 ■ Since there are more than 2"^(^i''^i) sequences Vj", 
the ESI encoder is assured with high probability to find a sequence u" such that (w",s") G 7^(^1,51). Since, 
in addition, there are less than 2"^(^i''^2'^^) codewords in the bin, the decoder is assured with high probability to 
find a unique sequence w" in the bin such that (w"s2: ^2 ) 6 Te {Vi, S2, V2). Therefore, the constraint on the rate 
of the shared ESI is maintained if R[ > I{Vi; Si) - I{Vi; 5*2, F2). 
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(b) The DSI encoder wants to describe S2 to the source encoder with a rate o/i?2. We generate 2"(-'^(^2;52)+e) 
sequences Kj" ^ i-i-d- p(w2) and randomly distribute them into 2" V ^ ^' ^' ^ ^' ' i' 1^+ ^/ bins; each bin contains 
2n(i(V2;X,Si.yi)-e) codewords. The DSI encoder, given Sj, first looks for a sequence V2 that is jointly typical with 
S2 . If there is such a codeword, the DSI encoder sends the index of the bin where V2 is located to the source encoder. 
The source encoder, given x", s", w", looks for a unique sequence Wg ™ the received bin that is jointly typical with 
x", s", w". Since there are more than 2"^(^2''^2^ sequences V^, the DSI encoder is assured with high probability to 
find a sequence V2 such that (w2 ,52) G % {V2,S2)- At the same time, the source encoder is assured with high 
probability to find the unique sequence V2 in its received bin such that {v2,x", s",?;") G T} {V2,X, Si,Vi), since 
there are less than 2"^(^2;-f ,Si,yi) codewords V2 in the bin. Therefore, the constraint on the rate of the shared DSI 
is maintained if R'2 > /(F2; 5*2) - /(V2; X, Si,Vi). 

(c) The source encoder wants to describe the source X to the decoder with distortion smaller than or equal 
to D; that is E d{X,X) < D. For each v'^.v"^ we generate 2"(^('^'-^>5'il^i>^2)+e) sequences [/" using the PMF 
p(u'^\v1,v'i) = ^LlP("»l«l,»'«2,^) and randomly distribute them into 2"(^(f^'-^'^i^l^i'^2)-/(c/;S2|Vi,y2)+2e) ^j^^^. 
each bin contains 2"(^('^''^^l^i'^^)~'^) codewords. The source encoder, given x",s",w",W2, looks for a sequence 
u" that is jointly typical with x",s",u",U2 and sends the index of the bin that contains u" to the decoder. The 
decoder, given Sg , w", v'2, looks for a unique sequence u" in the received bin that is jointly typical with S2^ f ", W2 ■ 
Upon finding the desired sequence w", the decoder declares Xi — g{ui,S2.i,vi,i,V2,i) for i G {l,2,...,n} to 
be the reconstruction of the source x". Having more than 2"-^('^'^''^il^i'^2) sequences t/" assures the encoder 
with high probability to find a sequence u" such that (u",a;", s", w", Wg) G Te {U,X,Si,vi,V2)- Since, in 
addition, each one of the bins contains there are less than 2"-''-'^'^^^^^''^'^^ codewords t/", the decoder is assured 
with high probability to find a unique sequence u" in the bin such that (u", s^i^iN ^2 ) '= Te {U, S2,vi,V2)- 
Therefore, and since the Markov chain {X, Si) — (U, 5*2, Fl, V2) — X is satisfied, we can conclude that a rate of 
R > I(U:X, 5i|Vi, V2) — I{U; 5'2|Fl, V2) allows the decoder to produce x" that satisfies the distortion constraint 
with high probability; i.e., that d{x", x") < D with high probability. This concludes the sketch of the proof of the 
achievability. 



Appendix A 
Duality of the Converse of the Gelfand-Pinsker Theorem and the Wyner-Ziv Theorem 



In this appendix we provide proofs of the converse of the Gelfand-Pinsker capacity and the converse of the 
Wyner-Ziv rate in a dual way. 
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Channel capacity 



nR = H{W) 



(a) 

< I{W; y") - I{W; S") + ne„ 



-I{W;S,\SlXi] 



-/(W,r»-i;5,|5«+i)] +A 






I{W,Y'-\S^_^^;S,)\ +ne„ 



where 



= Er=i I{U^■,Y^^I{U,■,S,) 



A* ^Etins?+,;Y,\w,Y^-^), 

(a) follows from Fano's inequality 
and from that fact that W is 
independent of S*", 

(6) follows from the fact that Si is 
independent of Sl^^^. 



Rate-distortion 



nR = H{T) 



(a) 

> I{T;X") -/(T;5") 



^j (J ; '-'d'-'i+i) 



-I{W,X'~^;S,\Sl\^)\ +A-A* 



(59) 



(&) 

>ELi 



I{T,-,X^ ,S^j^i;Xi) 
I{T,X^~ , Sl]^i; Si) 



= Er=i I{U^■,X,)-I{U^;S^ 



A =ELi^(^^"';^^ir-^IVi)' 
A* ^Etins?+,;x,\T,x^-^), 

(a) follows from Fano's inequality 
and from the fact that T is 
independent of 5", 

(b) follows from the fact that Si is 
independent of S'^_^^i and that Xi 
is independent of X''~^. 



(60) 



By substituting the output Y and the input X in the channel capacity theorem with the input X and the output X 
in the rate-distortion theorem, respectively, we can observe duality in the converse proofs of the two theorems. 

Appendix B 
Proof of Theorem[T] 
In this section we provide the proofs for Theorem [T] Cases 2 and 2c ■ The results for Case 1, where the encoder 
is informed with ESI and the decoder is informed with increased DSI, can be derived directly from ||2] Theorem 
VII]. In ||2l, Steinberg considered the case where the encoder is fully informed with the ESI and the decoder is 
informed with a rate-limited description of the ESI. Therefore, by considering the DSI, S'J, to be a part of the 
channel's output, we can apply Steinberg's result on the channel depicted in Case 1. For this reason, the proof for 
this case is omitted. 

A. Proof of Theorem [7] Case 2 

The proof of the lower bound, C2'', is performed in the following way: for the description of the DSI, ^2, at a 
rate R' we use a Wyner-Ziv coding scheme where the source is 5*2 and the side information is 5*1. Then, for the 
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Fig. 15: Channel capacity: Case 2. Lower bound: C2*' = max/(t/;y, 5'2|V2) — /((7; 5*111/2), where the maximization is over 
all joint PMFs p(si, S2,V2,u, x, y) that maintain the Markov relations U — {Si, V2) — S2 and V2 — 5*2 — Si and the constraint 
R' > J(V2; 52 1 Si). Upper bounds: 02''^ is the result of the same expressions as for the lower bound, except that the maximization 
is taken over all PMFs that maintain the Markov chain U — (Si, V2) — S2, and C-^'''^ is the result of the same expressions as 
for the lower bound, except that this time the maximization is taken over all PMFs that maintain V2 — S2 — 5i. 



channel coding, we use a Gelfand-Pinsker coding scheme where the state information at the encoder is Si, S2 is a 
part of the channel's output and the rate-limited description of S2 is side information at both the encoder and the 
decoder Notice that I{U;Y, S2\V2) - I{U]Si, IV2) = /(C/; F, ^2, ^2) - I{U]Si,V2) and that, since the Markov 
chain V2 — S2 — Si holds, we can also write R' > /(V2; S2) — -^ (V2; Si). We make use of these expressions in the 
following proof. 

Achievability: (Channel capacity Case 2 - Lower bound). Given (S'i,i, 5*2. i) ^ i.i.d. p{si, S2) and the memoryless 
channel p{y\x, 81,82), fix p{si,82,V2,u,x,y) = p{si, 82)piv2\s2)p{u\8i,V2)p{x\u, si,V2)p{y\x, si, 82), where 
X = f{u, 81, V2) (i.e., p(x\u, si, V2) can get the values or 1). 



Codebook generation and random binning 

1) Generate a codebook Cy of 2"(-^(^2:S2))+2£ sequences V2 independently using i.i.d. ^ p{v2). Label them 
V2{k), where k G {l, 2, . . . , 2"'^^'^^^'"^^)+^'^)}, and randomly assign each sequence ^^(fc) a bin number 
K{v^{k)) in the set {l, 2, . . . , 2"-^'}. 

2) Generate a codebook C„ of 2"(^('^'^''^2'^2)-2e) sequences [/" independently using i.i.d. ^ p{u). Label them 
u"'{l), I e {1, 2, . . . , 2"(-^('^'^''^2'^2)~^^)}, and randomly assign each sequence a bin number bu{u''^{l)) in the 
set {1,2,..., 2"-^}. 

Reveal the codebooks and the content of the bins to all encoders and decoders. 

Encoding 

1) State Encoder. Given the sequence S2, search the codebook C„ and identify an index k such that (uj (fc) , 'S'2') G 
7i (V2, S2). If such a /j is found, stop searching and send the bin number j — by(v2{k)). If no such k is 
found, declare an error 

2) Encoder: Given the message W, the sequence S"" and the index j, search the codebook Cy and identify an 
index k such that (^2 (^); '^'i) & 7e (V2, 5*1). If no such k is found or there is more than one such index, 
declare an error If a unique k, as defined, is found, search the codebook Cu and identify an index I such 
that (u"(/),5|",w^(fc)) G re'-''\u,Si,V2) and fa„(u"(0) = Vl^. If a unique I, as defined, is found, transmit 
Xi — f{ui{l), Si^i, V2,i{k)), i — 1,2, . . . ,n. Otherwise, if there is no such I or there is more than one, declare 
an error. 
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Decoding 

Given the sequences F" , 5J and the index k, search the codebook C„ and identify an index I such that 

(w"(/),r",S'J,w^'(fc)) e 7;*"^(C/,r,S'2,V2)- if a unique I, as defined, is found, declare the message W to be 

the bin index where m"(/) is located, i.e., W — bu{u^{l))- Otherwise, if no such I is found or there is more than 

one, declare an error. 

Analysis of the probability of error 

Without loss of generality, let us assume that the message W = 1 was sent and the indexes that correspond with 

the given W == 1, Sl\ S*? are (fc = 1, / == 1 and j ^ 1); i.e., wj(l) corresponds with 5^\ K{v'^'il)) = 1, u"(l) is 

chosen according to {W = 1, S^, wj(l)) and 6„(u"(l)) = 1. 

Define the following events: 

hk' y^ 1 such that b^v^ik')) = 1 and {v^{k'),S^) e r}"\V2,Si)\ 
{Vw"(0 e Cu such that fe„(w"(/)) - 1, (m"(/),5i",w^'(1)) ^ 7;(")(t/, 5i, V2)} 

6 := {3r ^ 1 such that {u^{l'),Y", S^,v^{l)) £ ?;(")([/, F, ^2, "1^2)} 



Ei 
Ea:= 



The probability of error Pi"^ is upper bounded by PJ' < P{Ei) + P{E2\E1) + P{E3\E1, E^) + P{Ei\E1,E^,EI) + 
P{Eq\E1, ...,El)+ PiEelEf, . . . , E^). Using standard arguments, and assuming that (S'J*, 5^^ G 7;'"'(5i, Sa) 
and that n is large enough, we can state that 



1) 



Dj(fe)ec„ 

k=l 

2'n.(I(V2-,S2} + 2e} 

= n {l-Pr{{v-{k),S^)eT}-\V2,S2)} 

fc=i 



<e 



=e-^ . (61) 

The probability that there is no ^2 (^) ™ ^v such that (^2 (fc), '5'2') is strongly jointly typical is exponentially 
small provided that \Cv\ > 2"-^-^^^^'^^^^'^\ This follows from the standard rate-distortion argument that 
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2ni{V2;S2) „n's "cover" S^, therefore P{Ei) h^ 0. 
2) By the Markov lemma ll30l . since (5",52') are strongly jointly typical, {S2,V2{1)) are strongly jointly 

typical and the Markov chain Si — S2 — V2 holds, then (5f , S2, ^2(1)) ^re strongly jointly typical with high 

probability. Therefore, P{E2\Ef) -^ 0. 
3) 

PiE^\E'[,E^2)=P^{ U {v^{k'),S^)eT}'^\V2,Si)} (62) 

uJ(feVl)GC„ 
b„(i,J(fe'))=l 

< ^ Pr{(T;^'(fc'),5r) £?;(") (^2, 5i)} (63) 

b,(„J(fc'))=l 

< V 2"('^(^^'^i)+') (64) 

t;J(fcVl)eC„ 

&„(«J(fc'))=l 

^ 2n{I{V2;S2)+2e-R')2-niIiV2;Si)-e) /g^x 

^ 2"(-'"(^2;S2)-/(V2;5i)+3e-_R') /ggx 

The probability that there is another index fc', k' ^ 1, such that uj (^') is in bin number 1 and that is strongly 
jointly typical with S'" is bounded by the number of Wj (fc')'s in the bin times the probability of joint typicality. 
Therefore, if the number of bins R' > I{V2; S2) - /(V2; Si) + 3e then PiEslEf, E^) -^ 0. 

4) We use here the same argument we used for P{Ei); by the covering lemma, we can state that the probability 
that there is no u"-{l) in bin number 1 that is strongly jointly typical with (<S'",W2 (1)) tends to zero for 
large enough 71 if the average number of u"(^)'s in each bin is greater than 2"(^('^'"^i'^2)+e); j g., |Ctj|/2"^ > 
2n(/(c/;Si,V2)+e) fjjj^ ^j^q implies that in order to avoid an error the number of words one should use is i? < 
J(C/; Y, S2, V2) ~ /([/; Sx, V2) - 3e, where the last expression also equals /([/; Y, S2\V2) - I{U; Si\V2) ~~ 3e. 

5) As we argued for P{E2\Ef), since (X",u"(l), 5f , i;J(l)) is strongly jointly typical, (y", X", ^f , S"^') is 
strongly jointly typical and the Markov chain {U, V2) — (^, Si, S2) — Y holds, then, by the Markov lemma ll30l . 
(u"(l), y", S^, vl^{l)) is strongly jointly typical with high probabiUty, i.e., P{E5\E1, . . . , £;|) ^ 0. 

6) 

PiEe\El...,EI)=Pr{ [j {u'\n,Y-,S^,vl^{l)) e T}-HU,Y,S2,V2)} 

2n(I{U;Y,S2,V2) + 2e) 

< Y. Pr{{u^l'),Y^, S^, VgO e T}''\U, Y, S2, V2)) 

l'=2 

2ii(/(Lf;y,S2,V2) + 2t) 
< y^ 2~"(-'"(^^'*'''^2,V2)-£) 

r=2 

^ 2n{I{U;Y,S2.y2)-2f.)2-n{IiU-X.S2.V2)-i) 

= 2-"^ (67) 



32 



The probability that there is another index V , V ^ 1, such that u"(/') is strongly jointly typical with 
(y", S2, ^2(1)) is bounded by the total number of u"'s times the probability of joint typicality. Therefore, 
taking |C„| < 2"(^('^'^'-S2.^2)-£) assures us that P{Efi\El, ...,Ef)^ 0. This follows the standard channel 
capacity argument that one can distinguish at most 2"^('^'^''^2'^2) different u"(^)'s given any typical member 

of y" X Sl^ X V^. 

This shows that for rates R and R' as described and for large enough n, the error events are of arbitrarily small 
probability. This concludes the proof of the achievability and the lower bound on the capacity of Case 2. 

Converse: {Channel capacity Case 2 - Upper bound). We first prove that it is possible to bound the capacity 
from above by using two random variables, U and V, that maintain the Markov chain U — (^i, V2) — S2 (that is 
(j^biy xhen, we prove that it is also possible to upper-bound the capacity by using U and V that maintain the 
Markov relation V2 - S2 - Si (that is Cf^). 

Fix the rates R and R' and a sequence of codes (2"^, 2"^ , n) that achieve the capacity. By Fano's 
inequaUty, _ff(VF|F", 5^^) < ne„, where e„ ^ as n ^ 00. Let T2 = .^(S'j), and define ¥2.1 ~ 
(r2,r*-i,S'f^,+i,5r^), U, = W; hence, the Markov chain U, - {Sl,^,V2,^) - S2,^ is maintained. The proof 
for this follows. 

p{ui\si^i, V2,i, S2,i) =p(w|si^i, t2, J/'~\ s" i+i, S2~\ ■S2,i) 

P{W,X ,S^ |si,j,i2,y ,Si,+ i,S2 ,S2,^) 



7-1 ^-1 

(a) 

,-1 ,«-i 



Next, consider 



- E PK''\h,f-\sl,,4-')pix^-'\t2,f-\s^,sl-')piw\x^~\t2,f-\s^,sl-') 

P(w'l*2, y*"\ s",+i, S2"\ Sl,j). (68) 



nR' >H{T2) 

>H{T2\S^)-H{T2\S^,S^) 

=I{T2;S'2Vi) 
=H{S^\S[^)-H{S^\T2,S[^) 

n 

= 2_^\H{S2,i\Si,Sl ) - H{S2,i\T2,Si,S2 ] 



i=l 
{a) 

i=l 
(b) 



J2[H{S2,^\Sl,^) - H{S2AT2, S[\ S'2'\Y'-^) 

i=\ 
n 

2_^\H{S2,i\Si^i) — H{S2,i\T2,Si^^_^_i,S2 ,Y^ , 5i 

i=l 
n 

E [H{S2,^\Sl,^)-H{S2,^\V2^^,Sl,,) 
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-■J2I{S2,^■,V2,^\Sl,i), 



(69) 



i=l 



where (a) follows from the fact that S2.i is independent of {Sl^^, ^li+ij "5*2 ^) given Si^i, and the fact that y*^i is 
independent of 82,1 given (T2, 5f , 52^) (the proof for this follows) and (b) follows from the fact that conditioning 
reduces entropy. 

x'^ .w 

=p(2/*-l|i2,s^4-l), (70) 



where we used the facts that W is independent of (T2, 5", S^J, X" is a function of {W,T2,S") and that the 
channel is memoryless; i.e., y*^i is independent of (VK, r2, 5*"^, S'Jj) given (X'^^, 5J^^, 6*2^^). We continue the 
proof of the converse by considering the following set of inequalities: 

nR =H{W) 

<H{W\T2) - H{W\T2, r", 5^') + ne„ 
^I{W;Y^,S^\T2)+ne^ 

n 

= Y,I{W-Y,,S2AT2,Y'-\Sl-^)+nen 

n 

'=' ^ [l{W, Sl,+,-Y,, S2AT2,Y^-\Sl-') 
~I{Sl,^^-Y,,S2AW,T2,Y'-\S\- 
'Y.\i{W,Sl,^^-Y,,S2AT2.Y^-\S\-^) 

~/(5i,.;y*-\5ri|w^,r2,5iVi) 

n 

1=1 

-/(5i,,;W^|T2,r*-\ 51^+1, 5r') 
+ A - A* + ne„, 



where 



(c). 



^ — /_^-^('5"i+i;^i,<5'2,i|^2,^* ,'S'2 ), 
i=l 
n 

A* =^/(5M;y-\5r'|r2,5iVi), 



(71) 

(72) 
(73) 
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(b) follows from the mutual information properties and (c) follows from the Csiszar sum identity. 
By using the Csiszar sum on (iTZt and ( f73] l. we get 



A = A*, 



and, therefore, from (|79t and (iTTt 



n 

i?'>-V/(52,.;F2,,|^i,0 

2—1 

n -^ — ' L 



(74) 

(75) 
(76) 



Using the convexity of E! and Jansen's inequality, the standard time sharing argument for R and the fact that 
e„ — > as n — > cx), we can conclude that 



i? </(f/;r,52 11^2) -/(C/;5i 11^2), 



(77) 
(78) 



where f7 and V maintain the Markov chain I] — (5i, V2) — ^2 



We now proceed to prove that it is possible to upper-bound the capacity of Case 2 by using two random variables, 
I] and V , that maintain the Markov chain V^ — S^ — Sx. Fix the rates R and R! and a sequence of codes (2"^, 2"^ , n) 
that achieve the capacity. By Fano's inequality, _ff (VF|y", S2) < «■£„, where e„ — > as n — > 00. Let T2 = JviS"^) 
and define T/2,» = (r2, 5^"^), L/^ = [W, Y^-^, ST^^+i)- The Markov chain V2,^ - S'2,j - 5i,, is maintained. Then, 

ni?' >H{T2) 

>H{T2\S'l) - H{T2\S1, S^) 

n 

= 2^\H{S2,i\Si,S2 ) — H{S2,i\T2,Si,Sl ] 



i=l 



(a) 



> 



■II 
/ , Vii{S2,i\S\^i) — if(5'2,i|T2,S'i,i,S'"j_|_i,S'2 ) 

i=\ 

n 

/ ^ U?('5'2,i|<5'i,i) — H{S2,i\T2,Si^i,Sl ) 

i=l 
n 

^[il(52,»|5l,2)-i?(52,»|^2,2,5l,0 
n 

5Z/(52,»;F2,2|5i,0, 



(79) 



i=l 
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where (a) follows from the fact that S2S is independent of {S\ ^, '5"i_|_i, 52 ^) given Sis, and the fact that 
(y'"\ Sl^^) is independent of S2S given {T2, 5*"^, 52^^); the proof for this follows. 

p(y'"\ /f ^1*2, <^, sr\ S2,») - XI P(y'"\ sr\ 2^", ^"1^2, <^, 4"\ S2,») 

=p{y'-\s\-^\t2,sl,,s\~^), (80) 

where we used the facts that W is independent of (T2, 5"^, 5JJ, S'J^^ is independent of (12, 5*" j, 6*2 i) given 5*2^^, 
X" is a function of (VK, T2, 5") and that the channel is memoryless; i.e., y^i is independent of (VF, T2, 5"^, 5^^) 

given (x-i,5ri,5r')- 

In order to complete our proof, we need the following lemma. 
Lemma 8. The following inequality holds: 

n n 

i=l i=l 

Proof: Notice that 

n n 

X/(5i,;w^,r^-\5r,,+iiT2,5r')-E^('^i.-^'^'"'"^M+i'^r'i^2)-/(5i,.;5r'i7^2) (82) 

i=l i=l 

and that 

n n 

Y,I{Six.W,Y''\Sl-^\T2,Sl,+,) = Y,I{Si,v,W,Y'-\Sl,+^,Sl-^\T2)-I{Six,Sl,+i\T2). (83) 

Therefore, it is enough to show that X]r=i ^-^('5'i.i; >5'2~^|T2) < X]r=i ~^{^i,u ^i i+iC^'i) holds in order to prove 
the lemma. Therefore, consider 

n n n 

J2 -HSi.,; SL+i\T2) - ( J2 -^(^1,- Sr'\T2)) = E HiSi.,\T2, Sl,+,) - H{Si.,\T2, S^') 

i—1 i—1 i—1 

n 

^Y.H{S'^\T2) - H{SiAT2,Sl'') 

i=l 

n 

= E H{Si.,\T2, S\-^) - H{Si^,\T2, ^r') 

(a) 

> 0, (84) 

where (a) follows from the fact that the Markov chain Si^i - {T2,S2^^) — (T2,5'J^^) holds and from the data 
processing inequality. This completes the proof of the lemma. ■ 

We continue the proof of the converse by considering the following set of inequalities: 

nR ^H{W) 
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<H{W\T2) - H{W\T2, Y'\S^) + ne. 

n 



ne„ 



(a) 



J2[l{W,Sl,+,-Y,,S2.\T2,Y'~\S'2~') 



-IiSl,+,;Y,,S2AW,T2,Y'-\Sl'] 

n 

^^ J2 \hw, si,+,-Y,, S2.,\T2, r^-i, ^r 1) 



ne„ 



I{S,.,-Y'~\S'2''\W,T2,Sl,+,) 



ne„ 



(c) 



< Y. [l{W,Sl,^,-Y.,,S2.AT2.Y^~\sr') 

-i{Si.,-w,Y'-\si,+,\T2,s':2-^)^ 

n 

J2 1{U.;Y,, Sl,+,\T2, SI-') - I{U,; Si.,\V2,^) 



ne„ 



ne„ 



(85) 



where (a) follows from the mutual information properties, (b) follows from the Csiszar sum identity and (c) follows 
from Lemma 3. Therefore, 



i?-e 



n 
R' >-y^I{S2,^■,V2ASl,,) 

1=1 

n <-Y,\l{U^■,Y,,S2^V2.^) - l{U^\Sx.,\V2,^) 



(86) 
(87) 



Using the convexity of E! and Jansen's inequality, the standard time sharing argument for R and the fact that 
e„ — > as n — > oo, we can conclude that 



R <Ii:U-Y,S2\V2) - l{lJ\Sx\V2), 



(88) 
(89) 



where the Markov chain V2 — 5*2 — 5*1 holds. Therefore, we can conclude that the expression given in ( fT2] i is an 
upper-bound to any achievable rate. This concludes the proof of the upper-bound and the proof of Theorem [T] Case 

2. 

B. Proof of Theorem^ Case 2 c 

For describing the DSI, S2, with a rate R! we use the standard rate-distortion coding scheme. Then, for the 
channel coding we use the Shannon strategy [4] coding scheme where the channel's causal state information at the 
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encoder is ^i, 5*2 is a part of the channel's output and the rate-Hmited description of 5*2 is the side information at 
both the encoder and the decoder 
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Fig. 16: Channel capacity: Case 2 with causal ESI. C2C = max/(f/; Y, S'2|V2), where the maximization is over all PMFs 

p{y2\s2)p{u\v2)p(x\u, si,v2) such that R' > I{V2;S2)- 

Achievability: (Channel capacity Case 2c)- Given {Si^i,S2,i) ^ i-i-d. p{si,S2), where the 
ESI is known in a causal way (SI at time i), and the memoryless channel p{y\x,si,S2), fix 
p{si,S2,V2,u,x,y) = p{si,S2)p{v2\s2)p{u\v2)p{x\u, Si, V2)p{y\x, 81,82), where x = f{u,8i,V2) (i.e., 
p{x\u,8i,V2) can get the values or 1). 



Codebook generation and random binning 



1) Generate a codebook C^, of 2' 



>(/(V2;S2)H 



sequences V2 independently using i.i.d. ~ p{v2)- Label them 



vl^{k) where k e {l, 2, . . . , 2"(^(^2'^2)+2'=)}. 
2) For each ^2 (^) generate a codebook C„(fc) of 2"v ^ ■ ' ^1 2)- ej ggqugjjces f7" distributed independently 
according to i.i.d. ~ p{u\v2)- Label them u^iw, k), where w G {l, 2, . . . , 2"(^('^'^''^2l^2)^^'^)}, and associate 
the sequences u^^w, ■) with the message W — w. 

Reveal the codebooks and the content of the bins to all encoders and decoders. 

Encoding 

1) State Encoder. Given the sequence S2, search the codebook C„ and identify an index k such that (^2 (^) ^^2) ^ 
Te {V2, 82)- If such a fc is found, stop searching and send it. Otherwise, if no such k is found, declare an 
error. 

2) Encoder: Given the message W G {l, 2, . . . , 2"(^(^''*'^'5'2l^2)-2£)|^ ^^le index k and 5"} at time i, identify 
u^{W, k) in the codebook Cu(fc) and transmit Xi — f{ui{W, k), Si^i, V2,i[k)') at any time i G {1,2,..., n}. 
The element Xi is the result of a multiplexer with an input signal {ui{W,k),V2,i{k)^ and a control signal 

Decoding 

Given ¥"^,82 and k, look for a unique index W, associated with the sequence u'^{W,k) G Cu{k), such that 
(F", 5*2 , u"(M^, fc)) G Te {Y, U, S2\v2{k)). If a unique such W is found, declare that the sent message was W. 
Otherwise, if no unique index W exists, declare an error. 



Analysis of the probability of error 

Without loss of generahty, let us assume that the message W = 1 was sent and the index k that correspond with 
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S2 is k = 1; i.e., wj (1) corresponds to 6*2 and u"(l, 1) is chosen according to (W = 1,^2 (I))- 
Define the following events: 

E, := {Vi>2"(fc) G C„, {S^2\v2ik)) i 7;(")(^2,V^2)} 

E3 ■■= {3w' ^ 1 : u"(u;',l) G C„(l) and (j/"K, 1), F", 5^') G T;^"^^/, >", ^a^a^ll))}- 

The probabiHty of eiTor P^^ is upper bounded by P^" < P{Ei) + P{E2\Ef) + P{E^\El,E^). Using standard 
arguments and assuming that (S*", S'J) G Te {Si, S2) and that n is large enough, we can state that 

1) For each sequence Uj £ C^, the probability that V2 is not jointly typical with S2 is at most (l — 
2~n(i(V2iS2)+e)y Therefore, having 2"(^(^2:S2)+2e) j j ^j sequences in Cy, the probability that none of those 
sequences is jointly typical with 52* is bounded by 



n{I{V2;S2)+2e)f^ _ ^-niIiV2;S2)+e)) 
< e 



P{El) <2"<^nV2;!J2)+2e) (^^ _ 2 

_2'^(I(V2-,S2) + 2t)2-n(I(V2:S2} + e} 



= e-2"\ (90) 

where, for every e > 0, the last line goes to zero as n goes to infinity. 

2) The random variable Y" is distributed according to p{y\x, 81,82) = p{y\x,8i,82,V2), therefore, hav- 
ing {S^,v^il)) G r/"^(52,1^2) implies that (F", ^2", ^^2 (1)) e 7;^"^(r, ^2, F2). Recall that x, = 
f[ui{l,l),SiA,V2{l)) and that [/" is generated according to piu\v2); therefore, (X", S"", u"(l, 1), Wj (1)) 
is jointly typical. Thus, by the Markov lemma ["301, we can state that (F", 5^^, w"(l, 1), ^2 (1)) G 
Te {Y, 5*2, U, V2) with high probability for a large enough n. 

3) Now, the probability for a random C/", such that (C/", ^2 (1)) ^ Te (U, V2), to be also jointly typical with 
(y",S'^,w^(l)) is upper bounded by 2-''^^^'^-^'^^\^^^-^\ hence 

|c„(i)| 
PiE,\El,E^)< J2 Pr{{u-iw',l),Y",S^)e%^^^Hu,Y,S2W^il))} 

Ktu' 

|C„(1)| 

Kw' 

^2"iIiU,Y,S2\V2)-2e)2-n(IiU,Y.,S2\V2)~e) 

=2-"^ (91) 

which goes to zero exponentially fast with n for every e > 0. 
Therefore, Pe = P{W ^ W) goes to zero as n ^ 00. 

Converse: {Channel capacity case 2c)- Fix the rates R and R' and a sequence of codes (2"^, 2"^ ,71) that 
achieve capacity. By Fano's inequality, H{W\Y^,S2) < ne„, where e„ ^' as n ^- 00. Let T2 — fv{S2), and 
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define V2,^ - (T2, F'-i, 5^ ), U^ = W. Then, 



nR' >H{T2) 

>H{T2)-H{T2\S^) 

=I{T2;SS) 

=H{S^)-H{S^\T2) 

n 

= Y.[h{S2AsI-^) ^ H{S2AT2,sr') 

n 
^^Y. [^("^2,0 - H{S2,^\T2,S^\Y^-') 

n 
^Y.I{S2X,T2,Y^'\S2-') 

n 

= Y,I{S2,,;V2,{), 



(92) 



where (a) follows from the fact that 82.1 is independent of 6*2 and the fact that 52, i is independent of Y 
given {T2, 5'2 ^). The proof for this follows. 

Y, p{w)p{s\-'\sl-')p{x^-'\w,t2,s\-')p{f-'\x^-\s\-\,4-') 



W.X^ '-.S-, 



,- _ 1 i - 1 

--p{f~'\t2,sl-'), 



(93) 



where we used the fact that W is independent of {T2,Sl ^,6*2,^), SI ^ is independent of (T'2,52,i) given 6*2 ^, 
X*-i is a function of {W,T2,S{^^) and that Y'-^ is independent of iW,T2,S2a) given {X'~\ S{^^,Sl^^). We 
now continue with the proof of the converse. 

nR <H{W) 

<H{W\T2)-H{W\T2,Y^,S^)+nen 
=I{W;Y",S^\T2) + ner. 

= J2HW■,Y,,S2,^\T2,Y'-\Sl')+nen 



:^/(f/i;yi,52,i|V2,i) +ne„ 



and therefore, from (192) and 



R'>-YIiS2,^;V2,^) 



(94) 



(95) 
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1 "■ 
R-en<-y^I{U,-Y,,S2AV2, 

n. ^ — ^ 



(96) 



Using the convexity of R' and Jansen's inequality, the standard time-sharing argument for R and the fact that 
e„ — !> as n — > oo, we can conclude that 



R'>I{V2;S2). 
R<I{U;Y,S2\V2). 



(97) 
(98) 



Notice that the Markov chain V2,i ~ 5*2,1 — Si^i holds since (F* ^, ^2 '^) is independent of Si^i and T2{S2) is 
dependent on Si,i only through 82,1- Notice also that the Markov chain Ui ~ V2,,; — {Si^i, 82,1) holds since 

p{w\t2,y'~^ , S2~^ , Sl,i, S2,t) = ^ p{w, X'^'^ , S\~^\t2,y'^^ , Sl"^ , Sl,i, S2,z) 

= E Pisr'\t2,y'-\sl~')p{x^-'\t2,f-\s\-\.sl-')p{w\t2,X^-\s\-') 

^Piw\t2.f~\sl~'). (99) 

This concludes the converse, and the proof of Theorem [T] Case 2c- 



Appendix C 
Proof of Theorem[2] 

In this section we provide the proof of Theorem |2] Cases 1 and Ic- Case 2, where the encoder is informed 
with increased ESI and the decoder is informed with DSI is a special case of fTOJ for K = 1 and, therefore, the 
proof for this case is omitted. Following Kaspi's scheme (Figure [TtIi for K ~ 1, at the first stage, node W sends a 
description of W with a rate limited to R^, then, after reconstructing W at the Z node, it sends a function of Z 
and W over to node W with a rate limited to R^. Let ^2 be W in Kaspi's scheme and {X, Si) be Z in Kaspi's 
scheme. Consider D^ — d{Zi, Zi) — d(^{X, Si,i), {Xi, Si,i)) = d(Xi, Xi) = D. Then, it is apparent that Case 2 of 
the rate-distortion problems is a special case of Kaspi's two-way problem for K — 1. 



{w,}- 



W CODEC 



Binary data at rate R^ 



Binary data at rate R^ 



Z CODEC 



T 

{Z,} 



{Z.} 



my 

K 



Fig. 17: Kaspi's two-way source coding scheme. The total rates are Rn, = X]fc=i R-w and R^ = X]fc=i -^z ^^d the expected 
per-letter distortions are Du, = Ie[^ J2"=i diW^, Wi)] and D^^eU Y1"=i d{Zi, Zi)] . 



41 



A. Proof of Theorem |2] Case 1 

We use the Wyner-Ziv coding scheme for the description of the ESI, Si, at a rate R' , where the source is Si 
and the side information at the decoder is 52. Then, to describe the main source, X, with distortion less than or 
equal to D we use the Wyner-Ziv coding scheme again, where this time, 5*2 is the side information at the decoder. 
Si is a part of the source and the rate-limited description of Si is the side information at both the encoder and 
the decoder Notice that /(C/; X, Si\Vi) - /(C/; S2\Vi) = I{U; X, Si,Vi) - I{U] Si,Vi) and that since the Markov 
chain Vi— Si — S2 holds, it is also possible to write R' > I{Vi; Si) — I{Vi; 6*2); we use these expressions in the 
following proof. 
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Fig. 18: Rate-distortion: Case 1. Ri{D) = min/(C/;X, SilVi) — I{U; S2\Vi), where the minimization is over all PMFs 
p{vi\si)p{u\x, si,vi)p{x\u, S2,vi) such that R' > I{Vi;Si\S2) and Erd(X,X)l < D. 

Achievability: {Rate-distortion Case 1). Given (X^, S'l^i, 5*2,,) i.i.d. ^ p{x, 81,82) and the distortion measure 

D, fix p{x,8i,82,vi,u,x) — p{x, 81, 82)p{vi\8i)p{u\x, 8i,vi)p{x\u, 82,vi) that Satisfies E[(i(X,X)] = D and 

X = f{u,82,Vi). 

Codebook generation and random binning 

1) Generate a codebook, C^, of 2"v ^ ^' ^'^ '/ sequences, Vi, independently using i.i.d. ^ p{vi)- Label 
them Vi{k), where k G {l, 2, . . . , 2"'^'^i''^i-'+^'^'} and randomly assign each sequence u"(fc) a bin number 
h^iv-^ik)) in the set {l, 2, . . . , 2"-^'}. 

2) Generate a codebook Cu of 2" V ( ■ i^ 1)+ v sequences [/" independently using i.i.d. ~ p(ii). Label them 
m"(0, where I e {l,2, . . . ^ 2"('^(^^^'^i'^i)+^'')}, and randomly and assign each u"(/) a bin number &„(m"(Z)) 
in the set {l, 2, . . . , 2"^}. 

Reveal the codebooks and the content of the bins to all encoders and decoders. 
Encoding 

1) State Encoder. Given the sequence 5", search the codebook C^, and identify an index k such that (S*", w"(fc)) £ 
Te {S, Vi). If such a fc is found, stop searching and send the bin number j = 6i,(w"(fc)). If no such k is 
found, declare an error. 

2) Encoder: Given the sequences X", Si and w"(fc), search the codebook Cu and identify an index I such that 
(X", Si, w"(fc), w"(0) £ % {X, Si, Vi, U). If such an I is found, stop searching and send the bin number 
w = 6„(u"(/)). If no such / is found, declare an error. 

Decoding 
Given the bins indices w and j and the sequence 5^ , search the codebook Cy and identify an index k such 
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that {S2,Vi{k)) G Te iS2,Vi) and 6„(u"(fc)) = j. If no such k is found or there is more than one such 

index, declare an error. If a unique k, as defined, is found, search the codebook C„ and identify an index I 

such that (5^, w"(A:), u"(Z)) G % iS2,Vi,U) and 6„(u"(Z)) = w. If a unique /, as defined, is found, declare 

^i = fi(u2{l), '52, i, wi,i(fc)), i — 1,2, ... ,n. Otherwise, if there is no such I or there is more than one, declare an 

error. 

Analysis of the probability of error 

Without loss of generality, for the following events E2,E^,E4, E^ and Eq, assume that v^{k = 1) and by(v^{k = 

1)) = 1 correspond to the sequences {X", 81,82) and for the events £^5 and Eq assume that u"(/ = 1) and 

bu{u"'{l = 1)) = 1 correspond to the same given sequences. Define the following events: 



El := 
E2-= 
Es-.^ 
Ei:= 
E^:= 
Ee:= 



{W^{k) e C {8^,v^{k)) i %^'^\8i,Vi)] 

{(5r,<(l)) eri")(>5i,T^i)but {8-,v-{l))i%(-\82,Vi)] 

\3k' ^ 1 such that b^{v1{k')) = 1 and (S'^,<(fc')) e ri"'(5'2, Vi)} 

[yu^{l)eCu, {X-^,8-^,vUl),u-{l))i%^-^\X,8i,Vi,U)] 

{{X\8-,,v^,{l),u^\l))^%(-\x,8i,Vi,U) but {8^,vUl),u-{l))t%^''\82,Vi,U)] 

hi' ^ 1 such that bu{u'\l')) ^ 1 and (S'J,<(1), u"(Z')) e T,^'^\82,Vi,U)\. 



,(") 



The probabihty of error P'e"' is upper bounded by P^ < P{Ei) + P{E2\El) + P{E'i\E'l,E^) + 
P(Ei\El,E!2,El) + P{E5\E'l,...,El) + PiEelEf . . . ,E^). Using standard arguments and assuming that 



-(«) 



(X", 8^, 8^) e Tr'iX, 81,82) and that n is large enough, we can state that 



1) 

P{Ei)=Vr{ fl {8-,v-{k))ir,(-\8i,Vi)] 

< n PT{{S'^,vrik))^Tl-H8i,Vi)} 

k=l 
_2"{'(Vr,Si)+2.)^_„j^Si:Vi)-r.. 

=e-"^ (100) 

The probability that there is no Vi{k) in Cy such that (5'",i'"(A:)) is strongly jointly typical is exponentially 
small provided that \Cv\ > 2"V ' ^' i)+"sj -pjjjg follows from the standard rate-distortion argument that 
2ni(Si;Vi) „n(/j)s "covcr" 5f , therefore P{Ei) -^ 0. 
2) By the Markov lemma, since {8", 82) are strongly jointly typical and (5", f"(l)) are strongly jointly typical 
and the Markov chain Vi — 5*1 — 82 holds, then (5*", 5*2 , w"(l)) are also strongly jointly typical. Thus, 
P(£;2|£^i)-^0. 
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3) 

P{Es)=Pr{ y {S^,v^{k'))eTl-\S^,V^)} 

6,(«i(fc')) = l 

< Y. Pr{(^r,<(fc'))e'7;(")(^i,Fi)} 

<(feVi) 
b„(t,i(fc'))=i 

^2"^{l(Vv,Si)+2^-R')2-n{l(S2;Vi)~e) _ ^.jqj-j 

The probability that there is another index k', k' ^ 1, such that u"(fc') is in bin number 1 and that it is 
strongly jointly typical with 53' is bounded by the number of u"(fc)'s in the bin times the probability of joint 
typicality. Therefore, if R' > I{Vi;Si) - I{Vi;S2) + 3e then P{E3\E'[,E^) -^ 0. Furthermore, using the 
Markov chain Vi — Si — S2, we can see that the inequality can be presented as R' > I{Vi; 5*1152) + 3e. 

4) We use here the same argument we used for P{Ei). By the covering lemma we can state that the probability 
that there is no u'"-{l) in C„ that is strongly jointly typical with (X", S*", w"(fc)) tends to as n — ;> 00 if 
< > I{U;X,Si,Vi)+e. Hence, P{Ei\E1,E^,EI) ^ 0. 

5) Using the same argument we used for P{E2\Ef), we conclude that P{E4\Ef,E2,E^) -^ 0. 

6) We use here the same argument we used for P(E2\E^). Since {U,X,SiVi) are strongly jointly typi- 
cal, {X, 81,82) are strongly jointly typical and the Markov chain {U,Vi) — (X, 5*1) — 5*2 holds, then 
{U, X, 8i,82,Vi) are also strongly jointly typical. 

7) The probability that there is another index I', I' ^ 1 such that u^{l') is in bin number 1 and that it is strongly 
jointly typical with (8^, w"(l)) is exponentially small provided that R > I{U; X, 81, Vi)-I{U; 82, Vi)+3e = 
I{U:X,Si\Vi)-I{U; 5'2|Vi) + 3e. Notice that 2"(^(f^;^"S'i'^i)-«) stands for the average number of sequences 
m"(^)'s in each bin indexed w for w G {1, 2, . . . , 2"^}. 

This shows that for rates R and R' as described, and for large enough n, the error events are of arbitrarily small 
probability. This concludes the proof of the achievability for the source coding Case 1 . 

Converse: (Rate-distortion Case 1). Fix a distortion measure D, the rates R', R > R{D) = min/(C/; X, 8i\Vi) — 
IiU:82\Vi) = mmI{U:X,8i\82,Vi) and a sequence of codes (2"-^, 2"-^', n) such that K\^J2'l^id{Xi,X, 
D. Let Ti = /^,(5|"), T = f{X'^,81,T) and define Via = (^i, S'J'i+i, S'rS 5J,+i) and Ui = T. Notice 
Xi = Xi{T, Ti, S'J) and, therefore, Xi is a function of {Ui,Vi,i,82.i). 



nR' >H{Ti) 

>H{T,\8^)-H{T,\S^,8^) 

=H{S'^\S^)-Hi8^\Ti,8^) 



n 

/ , ^H{8i,i\8ii^i,S2) - H{Si^i\Ti, 8^.^^1,82 



44 



(a) 



n 



i^l 



'11 
n 

J2l{Si.f,Vi,,\S2.), 



i=l 



where (a) follows from the fact that Si,i is independent of (5*"^^]^, 5*2 ^ , 5'2 i+J given 82.1- 

nR >H{T) 

>H(T\Ti, S^) - H{T\Ti,X", S^, S^) 
=/(T;X",S'i"|ri,S'^) 

n 

lii^Ai,;5l,j|Jl, ^2 >-^i+l> ^1,1+1) ^ J^[-^l,i>l,l\-L ,-1-1,^2 7^i+li ^l,i+l/ 
i=l 
n 

1=1 

/ , \H{Xi,Si^i\Ti, 5'",i+i, S*^) - H{Xi, Sia\T,Ti,Si.j^^i, S2] 

i=l 
n 

i=l 
n 

--Y,HX^,Sl,^■,U^\Vl,,,S2,^) 

i=l 
n 

i=l 

ni?(i:»), 



> 



(X,;, X,; 



>ni? E 



( ^i , Xi 



(102) 



(103) 



where (6) follows from the fact that {Xi, 5i,i) is independent of X'^j_^ given (Ti, S'J'i+i, 5*2 ); this is because Xfj^^ 
is independent of {Ti,X^,S\) given (STi+i, 5*2 i+i), (c) follows from the fact that conditioning reduces entropy 
and {d) follows from the convexity of R{D) and Jensen's inequality. 

Using also the convexity of R' and Jensen's inequality, we can conclude that 



R' >I(Vi]Si\S2), 
R>I{U;X,Si\Vi,S2) 



(104) 
(105) 



It is easy to verify that (21, 'S'"i+i, S'2~^, 'S'2 i+i) ~ Si^i — 82,1 forms a Markov chain, since Ti{Si) depends on 
82,1 only through Si,i- The structure T— (Ti, 5*"^^]^, S'2^^, 5*2 ^+1, Xi, S'l.i) — 82.1 also forms a Markov chain since 
82,% contains no information about {81^"^ , X^^^ , X^_^-^) given (Ti, 5*"^, 5*2^^, S'g.j^^, X^) and, therefore, contains 
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no information about T(X", S]", Ti). 

This concludes the converse, and the proof of Theorem |2] Case 1 . 

B. Proof of Theorem^ Case Ic 

For describing the ESI, Si, with a rate R' we use the standard rate-distortion coding scheme. Then, for the main 
source, X, we use a Weissman-El Gamal lfT2l coding scheme where the DSI, ^2, is the causal side information at 
the decoder. Si is a part of the source and the rate-limited description of 5*1 is the side information at both the 
encoder and decoder 
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Fig. 19: Rate-distortion: Case 1 with causal DSI. Ric{D) = mmI{U;X,Si\Vi), where the minimization is over all PMFs 
p{vi\si)p{u\x, si,vi)p{x\u, S2,«i) such that R' > I{Vi; Si) and E d(X, X) < D. 

Achievability: (Rate-distortion Case Ic)- Given {Xi, Si^i,S2,i) ^ i.i.d. p{x,si,S2) where the DSI is 
known in a causal way (^2 in time i) and the distortion measure is D, fix p{x,si,S2,vi,u,x) = 
p{x, si, S2)p{vi\si)p{u\x, Si, vi)p{x\u, S2, Vi) that satisfies E[(i(X, X)\ = D and that x ~ f{u, S2, vi). 

Codebook generation and random binning 

1) Generate a codebook C^ of 2"v ^ ^' ^'^ '^' sequences Vi independently using i.i.d. ^ p{v2)- Label them 
<(fc) where fee {l, 2, . . . , 2"(^(^i''^i)+2^)}. 

2) For each w"(fc) generate a codebook C„(fc) of 2"v ^ ■ ' ^l ^>^ '/ sequences f/" distributed independently 
according to i.i.d. ^^ p{u\vi). Label them u'^{w,k), where w G {l, 2, . . . , 2"(^(^'^''^il^i)+2<^)}. 

Reveal the codebooks to all encoders and decoders. 

Encoding 

1) State Encoder. Given the sequence 5", search the codebook C^, and identify an index k such that (u" (fc) ,5'") G 
Te (Vi, Si). If such a fc is found, stop searching and send it. Otherwise, if no such k is found, declare an 
error. 

2) Encoder: Given X",5" and the index k, search the codebook C„(fc) and identify an index w such that 
[u"{w,k),X",Si) e % {U, X, Si\vi{k)). If such an index w is found, stop searching and send it. 
Otherwise, declare an error. 

Decoding 

Given the indices w, k and the sequence SI at time i, declare Xi = f{ui{w, k), 82,1, wi,i(fc)). 

Analysis of the probability of error 

Without loss of generality, let us assume that w"(l) corresponds to 5" and that w"(l, 1) corresponds to 
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(x",5r,«5'(i)). 

Define the following events: 



i?2 := {v^i"Kl) eC„(l), (X",5r,u"(zi;,l)) ^7;(")(X,5i,C/)} 

The probabihty of error Pi"^ is upper bounded by P^ < P{Ei) + P{E2\E'[). Assuming that (5f,S'f) S 
Te {Si, 5*2), we can state that by the standard rate-distortion argument, having more than 2"(^(^i''^i)+^^ sequences 
v'^{k) in Cy and a large enough n assures us with probability arbitrarily close to 1 that we would find an index 
k such that {vUk),S]') G TJ^'^HVi^Si). Therefore, P{Ei) ^ as n ^ 00. Now, if {v'^{l),S]') G tI''\Vi,Si), 
using the same argument, we can also state that having more than 2"(^('^'^''^il^i)+'^) sequences u'^{w, 1) in C„(l) 
assures us that P{E2\Ef) ^ as n ^ cxd. This concludes the proof of the achievability. 

Converse: {Rate-distortion Case Ic)- Fix a distortion measure D, the rates R', R > R(D) = min I{U;X, 5*1 l^i) 
and a sequence of codes (2"-^, 2"-^', n) such that E i J27=i diX„Xi)\ = D. Let Ti = /„(5f ), T = /(X", Sl\ Ti) 
and define Via = (^i; S'i'i+i), ^j = T. Notice that Xj = Xj(r, TijS*!), and, therefore, X^ is a function of 



nR' >H{Ti) 

>H{V) - H{Ti\S]^) 

n 

= / . [-H"(S'i,i|S'",+i) -i?(5'i,i|Ti,S'"^i+i) 



(a) 



7i 

^[H(5M)-i?(5M|Ti,5r,m) 

n 

J2[H{Sl,^)-H{Sl.,\Vl,^) 

i=l 
n 



(106) 



where (a) follows the fact that Si,i is independent of S'"i_|_i. 

ni? >iJ(r) 

=/(T;X",s'r|ri) 
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(b) 



■II 



(c) 



> 2_^ \H{Xi,Si,i\Ti,Si.^^i) ~ H{Xi,Si,i\T,Ti,Si.^^i) 

i=l 
n 

/ , HXi, Sia; T\Ti, Sii^i) 

i=\ 
n 

i=l 
n 



>nR 



1 " 



dlX^XA 



--nR{D) 



(107) 



where {h) follows from the fact that (X^, Si,i) is independent of Xfj^^ given (Ti, 5"^^]^), (c) follows from the fact 
that conditioning reduces entropy and (d) follows from the convexity of R{D) and Jensen's inequality. 
Using also the convexity of R' and Jensen's inequality, we can conclude that 



i?'>/(l4;^i), 
R>I{U]X,Si\Vi) 



(108) 
(109) 



It is easy to verify that both Markov chains Vi,i — Si^i — (Xi,S2.i) and Ui — {Xi, Si^i,Vi^i) — 82,1 hold. This 
concludes the converse, and the proof of Theorem |2] Case Ic- 

C. Proof of Theorem |2] Case 2 
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Fig. 20: Rate distortion: Case 2. R2(D) = mmI{U;X,Si\V2) - IiU;S2\V2), where the minimization is over all PMFs 

p{v2\s2)p{u\x,si,v2)p{x\u,S2,V2) such that R' > I {V2; S2) - I {V2; X , Si) andE[d(X,X)] < D. 

This problem is a special case of ifTOl for K ^ I, and hence, the proof is omitted. 

Appendix D 
Proof of Lemma 1 

We provide here a partial proof of Lemma 1. In the first part we prove the concavity of C2^{R') in R' for Case 
2, the second part contains the proof that it is enough to take X to be a deterministic function of (^i, Vi, U) in 
order to achieve the capacity Ci{R') for Case 1 and in the third part we prove the cardinality bound for Case 1. 
The proofs of these three parts for the rest of the cases can be derived using the same techniques and therefore are 
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omitted. The proof of Lemma 2 can also be readily concluded using the techniques we use in this appendix and is 
omitted as well. 



Part 1: We prove here that for Case 2 of the channel capacity problems, the lower bound on the 
capacity, C^2{R')^ is ^ concave function of the state information rate, R' . Recall that the expression for 

C2 is C'2{R') = max/(C/;y, 52IV2) - I{U;Si\V2) where the maximization is over all probabilities 
p{si, S2)p{v2\s2)p{u\si,V2)p{x\u, si,V2)p{y\x, si, S2) such that R' > /(V2; 5'2|S'i). This means that we want to 
prove that for any two rates, i?'*^^' and R''-^\ and for any < a < 1 and a = 1 — a the capacity maintains 
C^^(ai?'« +ai?'(2)) > aC^&(i?'(i)) + aC^&(i?'(2)). Let (t/«, K,^'\ X(i), y(i)) and iU^^\V^^\x(^\Y<^^^) be 
the random variables that meet the conditions on i?'*^^' and on i?'*^^' and also achieve C2^{R'^^^) and C2iR''^^^), 
respectively. Let us introduce the auxiliary random variable Q E {1,2}, independent of Si, S2,V2,U,X and Y, 
and distributed according to Pr{Q = 1} = a and Pr{Q ~ 2} = a. Then, consider 

aR'^^'> + ai?'(2) = a[l{V2^^'>:S2) - /(X^2^''; ^1)] + a[l{V^^^; S2) - I{V^^^;Si)] 

^^ a[l{V2^'^:S2\Q = 1) -/(^i'^^ilQ - 1)] + a(/(yf ^ ^210 = 2) - /(K,^'); ^i|Q = 2)] 
^^^/(yi«;52|0)-/(Fi«;5i|0) 

i /(y^^', 0; ^2) - i{V2^'^\Q; Si), (110) 

and 

+ «[/(t/(2); y(2), ^2|K,(2)) _ /(C/(2); ^,|K,(2))] 

(^)j(t/(Q);rW),^2|T^i'^\Q)-/(c/(«^5i|yi«,0), (111) 

where (a),(6),(c) and (d) all follow from the fact that Q is independent of (Si, S2,V2,U,X,Y) and from Q's 
probability distribution. Now, let V^ = {V^''^\Q), U' == t/W), F' = yW) and X' == X'^^l Then, following from 
the equalities above, for any two rates R'^^^ and i?'*^^' and for any < a < 1, there exists a set of random variables 
([/', V^, X', Y') that maintains 

ai?'(i) + ai?'(2) :== 7(1/2'; '5'2) - I{V^; Si), (112) 

and 

Cf{aR'^^^+aR"^^^) >I{U';Y' , S2\V^) - HU'; Si\V^) 

=.aC^^(i?'(i)) + aC^^(i?'(2)). (113) 

This completes the proof of the concavity of Ci^iR') in i?'. D 
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Part 2: We prove here that it is enough to take X to be a deterministic function of ([/, 5i,Vi) in order to 

rndxamzs I{U]Y,S2,Vi) - I{U;Si,Vi). Fix p(u, wi|si). Note that 



p{y,S2\u,vi) = y^^p{si\,u,vi)p{s2\si,vi,u)p{x\si,S2,vi,u)p[y\x,si,S2,vi,u) 

= y^p{si\u,vi)p{s2\si)p{x\si,vi,u)p{y\x,si,S2) (114) 



a;,si 



is linear in p(a;|u, ui, si). This follows from the fact that fixing p{u,vi\si) also defines p(si|u,wi) and from the 
following Markov chains S2 - Si - {Vi,U), X ~ {Si,Vi,U) - S2 and Y - {X, 81,82) - {Vi,U). Hence, since 
/(C/; F, 52IV1) is convex in p{y, S2|wi) it is also convex in p{x\u, ui, si). Noting also that /([/; 5i|Fi) is constant 
given a fixed p{u, ui|si), we can conclude that I{U\ Y, S'2|Vi) — I{U; S'ljVi) is convex in p{x\u, vi, si) and, hence, 
it gets its maximum at the boundaries of p{x\u,vi,si), i.e., when the last is equal or 1. This implies that X can 
be expressed as a deterministic function of ([/, Vi, 81). D 

Part 3: We prove now the cardinality bound for Theorem [T] First, let us recall the support lemma lISTI p. 310]. 
Let P{Z) be the set of PMFs on the set Z, and let the set V{Z\Q) C V{Z) be a collection of PMFs p{z\q) on 
Z indexed by q £ Q. Let gj, j = 1, . . . ,k, he continuous functions on V{Z\Q). Then, for any Q ^ FQ{q), there 
exists a finite random variable Q' ^ p{q') taking at most k values in Q such that 

E 



9j(.Pz\Qiz\Q)) = / gjipz\Qiz\q))dF{q) 
J Jq 

= J2siiPz\,(Mq'))p{q'). (115) 

?' 

We first reduce the alphabet size of Vi while considering the alphabet size of U to be constant and then we calculate 

the cardinality of U. Consider the following continuous functions of p{x, si, S2, u\vi) 

Pxs,s,\v{j\vi), je{l,2,...,\X\\Si\\S2\-l}, 

g, = '{liVu8i)-IiVuY,82) j ^ \X\\Si\\S2l (116) 

IiU;Y,82\Vi = «i) - I{U;8i\Vi = vi) j = \X\\Si\\S2\ + 1. 

Then, by the support lemma, there exists a random variable V( with |V(| < |A'||iSi||tS2| + 1 such that 
p{x, si, S2), I{Vi;8i) - I{Vi-Y, 82) and I{U; Y, 82\Vi) - I{U; 8i\Vi) are preserved. Notice that the probability 
of U might have changed due to changing Vi; we denote the corresponding U as U'. Next, for v'l G V{ and 
the corresponding probability p{v'i) that we found in the previous step, we consider |A'||iSi||iS2||V{| continuous 
functions of p{x,si,S2,v[\u') 

^i Pxs,s.v;\u'{j\u') J -{1,2,..., 1^-1151115211^1-1}, ^^^^^ 

\ I{U';Y,82\V{)-I{U';8i\V() j = 1^-1151115211^1. 

Thus, there exists a random variable U" with \U"\ < |<Y||5i||52||V(| such that the mutual information expressions 
above and all the desired Markov conditions are preserved. Notice that the expression I{Vi; 81) — I{Vi; Y, 82) is 
being preserved since p{x, si,S2,v[) is being preserved. 
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To conclude, we can bound the cardinality of the auxiliary random variables of Theorem [T] Case 1 by |Vi| < 

|A'||5i||52| + 1 and \U\ < \X\\Si\\S2\\Vi\ < \X\\Si\\S2\{\X\\Si\\S2\ + l) without limiting the generality of the 
solution. n 

Appendix E 
Proof of Theorem[3] 

Proof: First, let us formulate the Lagrangian for the primal optimization problem defined in ( |40] i: 



i(q, M, 7, -'^) = X! P^^' s)9(*N) log 



, <lit\x) 



Qit\s) 

X,S,t -w V I / 



X t 

+ i[Y1 ^(^' s)q{t\x)d{x, t{s)) - D 



x.s.t 



~Y.K^tq{t\x), (118) 

x,t 

with Lagrange multipliers /i.,7 > and A >r 0. Recall that Q{t\s) is a marginal distribution that corresponds with 
q{t\x). i.e., 

Q(i|.) = ^#^^f»^. (119) 

In addition, recall the definition of the Lagrange dual function, 

ff(M,7,-'^) =inf-^(q,M,7:A). (120) 

In the following proof, we use q!i .^ ^ '■^ denote the optimal minimizer of the Lagrangian, L(q, /x, 7, A), for any 
fixed /i,, 7, and A. We also use the notation ^(/i., 7, Alq* y) to denote the Lagrange dual function with q!i -, ^ as 
a constant parameter. 

The outline of the proof is as follows: we first find the PMF q* ^, which is the minimizer of the Lagrangian, 
i(q, /x, 7, A). We then formulate the Lagrange dual function, ^(/i., 7i -^IqTi ^ a)' ^^'^ ^^^ Lagrange dual problem, 
which is to maximize g over /i., 7 > and A >^ 0. Next, we argue that we can maximize g over /i, 7 > 0, A ^ 
and, in addition, over any q that nullifies the derivative of the Lagrangian (i.e., maintains equation ( I123l l) without 
increasing the solution of the Lagrange dual problem. We then note that it is possible to write the Lagrange dual 
problem with the variable p{x\s,t) instead of q{t\x), where p{x\s,t) is a marginal distribution associated with 
q{t\x). i.e., p(x|s, t) — y- (J'g\ (l\^\ is constrained to maintains the Markov chain T — X — S. Our next key step 
is to prove that we can omit the Markov chain constraint without increasing the maximal value of the Lagrange dual 
problem. We then conclude our proof by formulating the Lagrange dual problem that we obtained in a geometric 
programming convex form. 

In order to formulate g(/i,7,A), we first find the PMF q?i ~ ;v that minimizes the Lagrangian, L(q, /x,7,A), 
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which is a convex function of q. First, notice that 



am E/(-'.»'M''i^')>o«flg 



x' ,s' ,t' 
(a) 



E^(-'^')iog^+E^(-'«')-E^(-''^'MV)^ 



,Mx,s') 1 



Q{t\s') 



x' ,s' 



Pis') Q{t\s') 



(&) v~~v - /, , Q\t\x) , , ^— \ , ,, 

= 2^P[x,s Jlog „.^i ,. +Pix) - 2^p(x,s ) 



= ^P{x,s') log 



Qit\s')' 

where (a) follows from the fact that 

dQit'ls') _ d j:^„p{x",s')qit'\x") 



dq{t\x) dqit\x) p{s') 



(121) 



(122) 



0, t' ^t 

and (6) follows from the fact that p{x, s') is independent of x' and the fact that ^^, p(a;', s')q{t\x')—rpT = Q{t\s'). 



Next, we formulate the derivative of the Lagrangian with respect to q{t\x) and we constrain it to be equal to 0. 



dL 



dq{t\x) ^^' ' ' ^Qit\s) 
Using elementary mathematical manipulations we get 



= '^p{x,s)\og—-—-+id.:, + j'^p{x,s)d{x,t{s)) - A^,t = 0. 



\ogq{t\x) =^p(s|a;) logO(i|s)- -^ -'yd{x,t{x)) ^ 



Hence, 



<l*f.,yAt\^) = n Qm,7,a(^I^) ''''P { " ^ ~ ^'^(^' *("*)) + ^ } 



-|p(s|a;) 



(123) 



(124) 



(125) 



is an optimal minimizer of the Lagrangian. We get the Lagrange dual function by substituting q in the Lagrangian 
with (ill y X '■h^'- ^^ S^*- i*^ < ll25t and by using constraint ( |123t . 



+ fJ-x +jJ2sPi^^^)'^{^'^i^)) ~ '^^^t = 



^(q^,7,A'/^'7,''^) 




-E.M.-7A 


\/x,t 


-oo, 


otherwhise 



(126) 
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We get the Lagrange dual problem by making the constraints exphcit: 



maximize 



J2x^^^- 7-D 



subject to J2s Pi^^ «) log Q'''''\{t\s) + ^^^+"fT>s Pi^' s)d{x, t{s)) - K^t = 0, Vx, t, 
7>0, 
K,t > 0, Vx,t, 



(127) 



where the maximization variables are /x, 7 and A and the constant parameters are the PMFs q?i ^ a ^■^'^ p(^j *)' 
the distortion measure (i(x,t(s)) and the distortion constraint D. Notice that since the primal problem, (|40] |. is a 
convex problem with an optimal value of R{D), then the solution of ( I127l i is a lower bound on R{D) ||28I Chapter 
5.2.2], and, if Slater's condition holds, then strong duality holds and the optimal value of (|127| i is R{D). 



Now, notice that any q that maintains the first inequality constraint in ( |127t nullifies the derivative of the 
Lagrangian and, hence, results in the same value when placed in the Lagrangian; this value is exactly the Lagrange 
dual function. Therefore, since g gets the same value for any q that maintains the constraint (1123b . we can maximize 
g over all PMFs q that maintain constraint ( |123t without changing g's value. Consequently, the Lagrange dual 
problem in ( I127l i becomes: 

maximize — X^a; M^ ~ 7-^ 

subject to J2s Pi^^ «) log Q^ + f^x+lJ2s Pi^^ s)d{x, t{s)) - X^^t = 0, Vx, i, 

7 > 0, (128) 

K,t > 0, Va;,i, 

Y,tq{t\x) = 1, Vx, 

where the maximization variables are fi, 7, A and q and the constant parameters are p{x, s), d(^x, t{s)) and D. 



Next, combining ( 1125b and the fact that Q{t\s) > 0, we get that we can replace the first constraint in (1128b with 



{t\x) = n \Q{t\s) exp { - -g^ - jd{x, tis)) + ^} 



Xx 

p{x). 



p{s\x) 



\fx,t. 



(129) 



Since q{t\x) is independent of s, we can state that 

-Qit\s) 



-i-r (^[t\S) r fix I , .. \x.t \ 



p{s\x) 



(130) 



Let us denote a^- — — i-^ miu nui.^ mm. .., ^ — — j-, — ^■ 
chain T — X — S. Therefore, equation (1130b becomes 



ii^ and note that ^^ = p(a:k)Q(tis) ^ p(^N) where pix\s,t) maintains the Markov 
1 ^ n v{x\s) eyiY> lax - -fd{x,t{s)) + — ^ - logp(x|s,i)| 

C; L C\ J 



p{s\x) 



(131) 



53 



for all X, t, and the Lagrange dual problem can be reformulated as 

maximize ^^ a^pix) — ^D 



p{x\s)exp^ax --fd{x,t{s)) + ^ -logp{x\s,t)j 



- p{s\x) 



, Vx,i 



(132) 



subject to 1 = Hs 

7>0, 

EtP(2;|s,i) = 1, Vx, 

p{x\s, t) maintain the Markov chain T — X ~ S, 

where the variables of the maximization are a, 7, A and p G jjI'^ll'SIITI^ which is the set of all p{x\s,t) for all 
X E X,s E S and t E T, and the constant variables are p{x, s), d(^x, t{s)) and D. Notice that ( 1132b is not a convex 
problem anymore, since the constraint functions are not convex. We deal with this problem in the following steps 
by using geometric programming principles. 



Next, we want to prove that it is possible to maximize ( |132t over any PMF, p. i.e., we want to prove that 
dropping the last constraint in (1132b does not change the validity of the solution. 

First, since (1132b is an equivalent Lagrange dual problem, then, according to ll28l Chapter 5.2.2], we can state 
that for any choice of a, 7 and A it yields a lower bound on R{D). Furthermore, according to [28, Chapter 5.2.3], 
if Slater's condition holds, then the solution of ( |132b coincides with R{D), which is the optimal solution of the 
primal problem. Now, dropping the constraint that the Markov chain T — X — S must hold, necessarily allows the 
optimal solution of ( 11321 ) to be greater than or equal to the solution where T — X — S holds. We are left to prove 
that maximizing over any PMF, p, cannot exceed R{D). Let us place p{x\s,t) — ^^ ft Is) ^^ (1131b and look 
at the following inequalities: 



1 = 



]J p(a;|s)exp|aa; -7d(a;,t(s)) + -p^ -log 



p{t\x,s)p{x\s)- 



p(s\x) 



(a) 



Y\ cxp|logp(a:|s) +a:r -7(i(a;,i(s)) 
expja^, - "i^p{s\x)d{x,t{s)) 

S 

a) f 

> exp ja^c - j}^p{s\x)d{x,t{s)) 

s 

explax ~ j'^p{s\x)d{x,t{s)) 

S 

expla^ - j'^p{s\x)d{x,t{s)) 



A 



X,t 



lo. 



p{t\s) 

p{t\x,s)p{x\s)- 



p{s\x) 



P{x) p{t\s) 

-jA- - ^p{s\x) \ogp{t\x, s) + ^p{s\x) \ogp{t\s)j 

^ ^ s s 

-jA- - \og (^^p{s\x)p{t\x, s)j +^p{s\x)\ogp{t\s)j 

^ ' s s 

P{x) 



\ogp{t\x) +^p{s\x) logp(t|s)| 



(6) 



Jl p(a;|s) exp ja^; -7d(x,t(s)) + -p- -log 



K.t Y^ ^ I M P{t\^)p{x\s) 

-r^-> P sx log J—- 

p{x) ^ p{t\s) 

Ax.f , p(i|x)p(a;|s) ^ "l^^"'"^' 



^Y^p{s\x)\ogp(x\s)^ 



(133) 



p{x) pW) 

where (a) follows from Jensen's inequality and (6) follows from the fact that p(t\x) is independent of s. Notice that 
by reducing the value of ^^ p{,s\x) logp(i|x, s), we allow a^; — 7 ^^ p(s\x)d{x, t{s)) to be greater and, hence, we 
improve our maximum. Therefore, for any p{x\s, t) — ,.,^^ \f . s , we can take p'{x\s, t) — ELBL 



p{t\x ,s)p{x\s) 



P{^\s) X]s' P{s' \x)p{t\x ,S') ' 
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which satisfies the Markov chain T — X — S, and that the maximum over p{t\x) = J2sPi^\^)Pi^\^^ ■*) would be 
equal to or greater than the maximum over p{x\s, t). This, and the fact that maximizing over p{t\x) cannot exceed 
R{D) and that R{D) can be achieved by using p*{x\s,t) that corresponds to q*{t\x), prove that, indeed, we can 
maximize over p{x\s, t) without changing the result of the maximization. Therefore, our dual problem now becomes 



maximize J^x '^xP{x) — jD 



subject to Hs P{x\s)exp^ax-'^d{x,t{s)) + ^ -logp{x\s,t)j 

J2xPi^\^'t) = ^ Vs,i 
7>0. 



- p{s\x) 



1 Vx,t, 



(134) 



In order to make the problem convex, we need to convert the equality constraints that are not affine into inequality 
constraints. Let us go back to (1131b : since Xx,t > for all x and t and since p{x, s) > 0, the constraint (I131I I can 
be replaced by 

-ipis\x) 



1 > JJ p{x\s) exp lax - ld{x, t{s)) ~ \ogp{x\s, t)\ 



(135) 



without changing the solution of ( |132| l. Next, notice that there is a tradeoff between —\ogp{x\s,t) and ax — 
jd(^x, t{s)) . Therefore, we expect — \ogp{x\s, t) to be as small as possible to allow Ux — jd(^x, i(s)) to be as large 
as possible. Hence, we can replace the constraint 



which is equivalent to 



with the weaker constraint 



^p{x\s,t) = 1 Vs,t, 



'^exp{\ogp{x\s,t)} =1 Vs,i, 



(136) 



(137) 



^exp{logp(x|s,t)} < 1 Vs,t, (138) 

X 

without changing the result of the maximization. We denote yx.t,s = ^ogp{x\s,t) and rewrite the dual problem as 
maximize J2x ^xP{x) — jD 

-ip{s\x) 

:., — 'ydix. t(s)) — Vn- ^ t> < 1 Va;, t, 

(139) 



subject to Hs 



j{x\s)explax -jd{x,t{s)) - yx,s,t> 



Ea;exp{yx.s,t} < 1 Vs,i, 
7>0, 

where the variables of the maximization are a, 7 and y and the constant parameters are the PMF, p{x,s), the 
distortion measure, d(^x,t{s)), and the distortion constraint, D. 

Lastly, we present the dual problem in a geometric programming convex form by taking log() on the first two 
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inequality constraints: 



maximize ^^ a^^plx) — -fD 



subject to ax + J2sP(^\^) 



\ogp{x\s) - 7d(a;, t{s)) - y^^. 



<0 Va;,i, 



log(E2:exp{y:r,s,t}) < Vs,i, 
7>0, 



(140) 



where the variables of the maximization are a, 7 and y and the constant parameters are p(x, s), d(^x, t{s)^ and D. 



Appendix F 
Proofs for SectionIvTI 



A. Proof of Lemma |4] 

Proof: For < a < 1 and a — 1 — a 

Jw{aqi +aq2,aQi + aQ2) = ^ p{si, S2)w{v2\s2)p{y\t, Si, S2,V2)(aqi + a(?2 j log 

Si ,S2 ,V2.t,y 

- y^ Pisi,S2)w{v2\s2)piy\t,Si,S2,V2)(aqi\og—+dq2\og — 



aQ\ +aQ2 
aqi + aq2 



= aJwiqi,Qi) + aJw{q2,Q2), (141) 

where (a) follows from the log-sum inequality: 

^a,log^>alog^, (142) 

i 

for ^ . fli = a and J2i h ^ b. ■ 

B. Proof of Lemma |6| 

Proof: Let us calculate q* using the KKT conditions. We want to maximize Jw{q*, Q) over q*, where for all 

t,si and V2, < q*{t\si,V2) < 1 and Ei- g*(i'lsi, W2) = 1- 
For fixed si and W2, 

5 



= —,[Mq*.Q) + (1 -E'?*(*l^l'«2));^si,.,) 



VpCsi, S2)w;(u2|s2)p(y|i, Si, S2, U2) ( log ^,,,' 



Qit\y,S 2,V2) 
V2) 



1 - K, 



(143) 
(144) 



divide by p{si,V2), 



*,„ , , E52,«P(*l'*2)w(u2|s2)p(y|i,Si,S2,W2) 

= -logg {t\si,V2) + ^-^ -. ^ 

P[Sl,V2) 



logQ(i|2/,S2,«2) - 1 



P{S1,V2)' 



(145) 
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define -1 + :^^^^ = \ogv'^^^^^, hence 

g*(i|si,«2) =<,., 11^(^12^' *2,«2r(^^l^^'''^)^(^l*'^^'^^'''=\ (146) 

and from the constraint ^j, g*(t'|si,W2) = 1 we get that 



C. Proof of Lemma [7| 

The proof for this lemma is done in three steps: first, we prove that Uw [qi ) is greater than or equal to J^ (go i Qo ) 
for any two PMFs go(i|sii ^2) and qi{t\si, V2), then, we use Lemma 3 and Lemma 5 to state that for the optimal 
PMF, qc[t\si,V2), Cj^^ — JwiqcQc), and, therefore, Uw[q) is an upper bound of C2W for every q[t\si,V2)- 
Thirdly, we prove that Uw{q) converges to C^2w 

Proof: Consider any two PMFs, qQ{t\si,V2) and qi{t\si,V2), their corresponding 
{PQ{si,S2.,V2.,t,y),Ql{t\y,S2,V2)} and {pi(si, S2, W2,i, y), Qi(%, S2, W2)}, respectively, according to 
and ( |52] | and consider also the following inequalities: 

El . M Ql{Ay,S2,V2) , . 

Po{si,S2,V2,t,y)log Jwiqo,Qo) 
qi[t\si,V2) 



Q*lit\y,S2,V2) _ j^ , Qo{t\y,S2,V2) 

s,.,s2,v2,t,v qi{t\si,V2) °^ qoit\suV2) 



Y^ Po{si,S2,V2,t,y)(^log 



Y^ Po{si,S2,V2,t,y)\og(^ 



Q*l{t\y,S2,V2) qo{t\si,V2] 



s^.S2,V2,t,v ^Ql{t\y, S2,V2) qi{t\si,V2) 

=D(gro(t|si, W2)||gi(t|si, U2)) - D(Qo(t|y, S2,V2)\\Ql{t\y, 52,^2)) 

= D(go(t|si, S2, V2)p{y\t, Si, S2, V2)p{si, S2)w{v2\s2)\\qi{t\si, S2, V2)p{y\t, Si , S2 , W2 )p(si , S2)w(w2 |s2)) 

-n{Q*{t\y,S2,V2)\\Ql{t\y,S2,V2)) 
=B{po{si,S2,V2,t,y)\\pi{si,S2,V2,t,y)) -D{Qg{t\y,S2,V2)\\Q*i{t\y,S2,V2)) 

= ]D){pq{s2,V2, y)Qo{t\y, S2, W2)po(si |S2, V2, t, y)\\pi{s2,V2,y)Q*i{t\y, S2,V2)pi{si\s2, V2, t, y)) 

-B{Q*{t\y,S2,V2)\\Qlit\y,S2,V2)) 
=D(po(s2,W2,2/)||pi(s2,W2,y)) +^{po{si\s2,V2,t,y)\\pi{si\s2,V2,t,y)) 
= > 0, (148) 

where D(-||-) is the K-L divergence, Pj{s2,V2,y) and Pj(si|s2, W2,i,2/) are marginal distributions of 
Pj{si, S2, V2,t, y) for j = 0, 1, (a) follows from the fact that T is independent of 5*2 given (5i, V2) and from the K- 
L divergence properties, (6) follows from the fact that Q*j{t\y, S2, ^2) is a marginal distribution of pj(si, S2, V2, t, y) 
for j = 0, 1 and (c) follows from the fact that ©(-H-) > always. 
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Thus, 

J{qo,Qo) < y^ P0[Sl,S2,V2,t,y)\og — 



Qlit\y,S2,V2) 



= Y] P{si,S2)w{v2\s2)qo{t\si,V2)p{y\t,Si,S2,V2)\og 

^-^ , q\{As\,v2) 

Sl,S2,V2,t,y 

= Yl P(«l' ""2) V qo{t\si,V2)Yp(^2\si,V2)Yp(y\^^ ■^l' ■52, W2) log ^ ,,, ' ^' ^ 

^-^ ^ ^-^ ^-^ qx{t\sx,V2) 

S\,V2 t S2 y ± \ I 1 / 

*l{t'\y,S2,V2) 



- XI P(*i''"2)max^p(s2|si,W2)Xp(y|i',si,S2,W2)log 



t' ■'-^ ^-^ gi(r|si,i;2) 



S2 



=U^{qi)- (149) 

We proved that Uw{qi) is greater than or equal to Jw{qo,Qa) for any choice of qo{t\s2,V2) and qi{t\si,V2)- 
Therefore, by taking qo{t\si, V2) to be the distribution that achieves C!^^ and by considering Lemma 3 and Lemma 5, 
we conclude that Uyj{q) > Cw,2 for any choice of q{t\si, U2). 

In order to prove that [/,„ (q) converges to Cj^^, let us rewrite equation (1144) as 

P{S2\si-,V2)p{y\t,Si,S2,V2)\0g^—- = V (150) 

S2,y q*{t\si,V2) 

We can see that for a fixed Q, the right hand side of the equation is independent of t. Considering also 

Jw{q,Q)= V Pisi,S2)w{v2\s2)qit\si,V2)piy\t,Si,S2,V2)\og ' ^' f 

, q{t\si,V2) 

^ Y^ / N Y^ / I nY^ r 1+/ M Q*i't'\y,S2,V2) ,_,, 

- y^P{Sl^V2)ma.xyP(S2\Sl,V2)yp(yt,Si,S2,V2Jlog -— ^, (151) 



S2 



lb 



we can conclude that the equation holds when the PMF q is the PMF that achieves C2 
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